RR 
Photonic 


Imaging 


Is 
on 
5 

oa 


Topics in Applied Physics 


Volume 134 


Series Editors 
Young Pak Lee, Physics, Hanyang University, Seoul, Korea (Republic of) 
Paolo M. Ossi, NEMAS - WIBIDI Lab, Politecnico di Milano, Milano, Italy 


David J. Lockwood, Metrology Research Center, National Research Council 
of Canada, Ottawa, ON, Canada 


Kaoru Yamanouchi, Department of Chemistry, The University of Tokyo, Tokyo, 
Japan 


Topics in Applied Physics is a well-established series of review books, each of 
which presents a comprehensive survey of a selected topic within the area of 
applied physics. Edited and written by leading research scientists in the field 
concerned, each volume contains review contributions covering the various aspects 
of the topic. Together these provide an overview of the state of the art in the 
respective field, extending from an introduction to the subject right up to the 
frontiers of contemporary research. 

Topics in Applied Physics is addressed to all scientists at universities and in 
industry who wish to obtain an overview and to keep abreast of advances in applied 
physics. The series also provides easy but comprehensive access to the fields for 
newcomers starting research. 

Contributions are specially commissioned. The Managing Editors are open to 
any suggestions for topics coming from the community of applied physicists no 
matter what the field and encourage prospective book editors to approach them with 
ideas. 

2018 Impact Factor: 0.746 


More information about this series at http://www.springer.com/series/560 


Tim Salditt - Alexander Egner - D. Russell Luke 
Editors 


Nanoscale Photonic Imaging 


a Springer Open 


Editors 


Tim Salditt Alexander Egner 
Institut für Röntgenphysik Laser Laboratorium 
Universität Göttingen University of Göttingen 
Göttingen, Germany Göttingen, Germany 


D. Russell Luke 

Institut für Numerische 

und Angewandte Mathematik 
Universität Göttingen 
Göttingen, Germany 


ISSN 0303-4216 ISSN 1437-0859 (electronic) 
Topics in Applied Physics 
ISBN 978-3-030-34412-2 ISBN 978-3-030-34413-9 (eBook) 


https://doi.org/10.1007/978-3-030-34413-9 


© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication. 
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adap- 
tation, distribution and reproduction in any medium or format, as long as you give appropriate credit to 
the original author(s) and the source, provide a link to the Creative Commons license and indicate if 
changes were made. 

The images or other third party material in this book are included in the book’s Creative Commons 
license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s 
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the 
permitted use, you will need to obtain permission directly from the copyright holder. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publi- 
cation does not imply, even in the absence of a specific statement, that such names are exempt from the 
relevant protective laws and regulations and therefore free for general use. 

The publisher, the authors and the editors are safe to assume that the advice and information in this 
book are believed to be true and accurate at the date of publication. Neither the publisher nor the 
authors or the editors give a warranty, express or implied, with respect to the material contained herein or 
for any errors or omissions that may have been made. The publisher remains neutral with regard to 
jurisdictional claims in published maps and institutional affiliations. 


This Springer imprint is published by the registered company Springer Nature Switzerland AG 
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland 


Preface 


The word ‘Nano’ has been around for a long time. It became a topic of significant 
interest in the eighties of the last century, after instruments such as the scanning 
tunneling microscope and the atomic force microscope had been invented. The 
‘nanoscale’ was probed based on electric currents through a tunneling tip or by 
measuring the forces with a cantilever. In other words, the ‘room at the bottom’ was 
conquered not by ‘seeing’, but rather by ‘feeling’. Too strong was the belief that 
optical imaging was limited to the microscale due to the diffraction barrier. But the 
insight that photonics and nanoscale also make a perfect match followed only 
shortly after the advent of the scanning tunneling and atomic force microscopes. 
Around the turn of the millennium it became broadly accepted that plenty of ‘nano’ 
can be done with photons: Single molecule spectroscopy had been established, 
fluorescence correlation spectroscopy was emerging, and above all there was a new 
way to turn microscopes into nanoscopes based on optical switching, as pioneered 
by Stefan Hell here in Göttingen. While very few physicists cared about optical 
microscopes before, a time of rapid development had now set in. At the same time, 
a long-standing dream to realize X-ray microscopy was empowered by coherent 
optics and computational phase retrieval. 

Pairing up optical and short wavelength to extend the scales of ‘imaging’, 
research teams in Göttingen set out for new discoveries. But how to empower their 
vessels? The solution was found by mathematics. Using results from inverse 
problems, stochastics, and optimization theory, new and bountiful shores were 
discovered, and photonic data was turned into useful information.... 

As we now come back from our expeditions funded for the last 12 years by the 
German Science Foundation (DFG) through SFB755 Nanoscale Photonic Imaging, 
we do not want to keep all the treasures for ourselves. The current book is a 
compilation of tutorials, experiments and experiences, and a compendium for fur- 
ther reading. In addition to the contributing authors and Angela Lehee at Springer, 
we are grateful to Leon Lohse, Shahroz Shahjahan for helping to keep this project 
on track. Above all we would like to express our deepest gratitude to Eva Hetzel 
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who has been with this collaborative research center for the duration and has been 
essential to keeping the expedition on track, on budget and on time—all with grace 
and joyful optimism. 

Now, let us dive deep into the nanoscale, and not just scratch at its surface! 


Göttingen, Germany Tim Salditt 
Alexander Egner 
D. Russell Luke 
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Part I 
Fundamentals and Tutorials 


Chapter 1 A) 
STED Nanoscopy geai 


Alexander Egner, Claudia Geisler and René Siegmund 


The very first step of every understanding of the Microscope is ... 
to become familiar with the idea that it is a thing sui generis 
— Ernst Abbe [1] 


1.1 Fundamentals of Fluorescence Microscopy 


This section will present the basics of fluorescence microscopy. Starting from the 
intensity distribution within the focal spot of an objective lens, we will discuss the 
image formation and derive the classical formula of the resolution limit. Furthermore, 
we will introduce the principle of confocal detection. 


1.1.1 Vectorial Diffraction Theory and Intensity Distribution 
Within the Focal Spot 


The complex electric vector field &(r) in the focal region of an optical system 
can be expressed in terms of a modified Huygens-Fresnel principle as the coherent 
superposition of secondary plane waves at the exit pupil [2, 3] 
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Fig. 1.1 Secondary plane waves as well as strength and polarization of a focused wavefront. a 
A secondary plane wave can be defined for each point G of a wave front W leaving the pupil of 
an optical imaging system. The wave front of the secondary plane wave is tangential to W in G. 
b When focusing through a lens, the projection of the electric vector field onto the meridional plane 
changes its direction of polarization (vectors go and g,). The part of the vector field orthogonal 
to the meridional plane retains its polarization direction (vectors gj and g7,). Due to the transition 
from a plane to a spherical wavefront, the strength of the electric field within associated surface 
segments (d So and d Sp) changes such that the energy passing through the surface elements remains 
constant 


en 
Er) = -—// Ri RE, (S) "a2, (1.1) 
T 
R 


where 2 is the solid angle, E, is the complex amplitude of the secondary plane waves, 
Rı and R3 are the principle radii of curvature of the wavefront at the exit pupil and 
s is the unity vector in the respective direction of propagation, see Fig. 1.la. The 
wave number k is given by k = 27 n/Ao, with n being the refractive index and Ao 
the vacuum wavelength. Note that the geometric focus is at position r = (0, 0, 0). 

In order to derive &(r) for an aplanatic, i.e. axially stigmatic and obeying the sine 
condition, imaging system such as the objective lens of a microscope, E,(s), Rı and 
R have to be determined [4]. 

The typical scenario when focusing with an infinitely corrected lens is shown in 
Fig. 1.1b. Without loss of generality, we first assume that the light at the entrance 
pupil of the objective is linear polarized in the x-direction 


Eoo, yo) = Epere 0), (1.2) 


where Ep is the (real-valued) amplitude of the electric field, ex is the unit vector in 
x-direction and P is the pupil function which encodes the phase distribution of the 
electric field [5]. For arbitrary polarization states, £ (r) can then be calculated by 
the coherent superposition of several solutions of &(r) for correspondingly linear 
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polarized vector fields at the entrance pupil. The case of unpolarized light is obtained 
by averaging over all possible polarization states. 
According to Fig. 1.1b, the surface segment at the entrance pupil 
dSo = rodġodro, (1.3) 
will be transformed by the objective lens to a surface segment at the exit pupil 


dS, = f’ sin 0,d¢,d0,, (1.4) 


where ro is the distance of the surface segment from the optical axis and f is the 
focal length of the lens. As the lens obeys the sine condition 


ro = f sin b; (1.5) 
and the intensity law of geometrical optics [6] 
24¢ — p2 

EodSo = E,dS, (1.6) 


has to be fulfilled, the amplitude of the electric field at the exit pupil is given by 


ro dro 
wei ed 17 
F fang. fag, rn a) 


To determine the polarization of the electric field at the exit pupil, it is advisable 
to introduce two unit vectors for each light ray passing through the objective lens 


cos do cos d, cos ds 
Zo = | sindo | and g, = | cos 9, sind, (1.8) 
0 sin 0, 


in the corresponding meridional plane and two unit vectors 


— sin do — sind, 
gg = | cosdo | and g = | cosd, (1.9) 
0 0 


which are orthogonal to the meridional plane, Fig. 1.1b. Note that po = s. When the 
light rays pass through the lens, the portion of their electric fields originally pointing 
in the go directions are re-polarized in the g, directions and the portion pointing in 
the gj directions do not change their polarization. Hence, using (1.2), (1.7), (1.8) and 
(1.9), we can write for the amplitude of the secondary plane waves 


E, = Vcos 4; ((Eo : $0) 8p + (Eo - 5) 8%) . (1.10) 
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Fig. 1.2 Electrical field components pointing in x-, y- and z-direction for an initially x-polarized 
planar wave front after passing a lens 


which after expansion results in 


cos s + (1 — cos 8, ) sin? Qs l 
E, = y coss Eo | (cos 0s — 1) cos ġs sind, ef P 00yo), (1.11) 
sin 0, cos ds 


Note that the influence of the lens on the phase of the electric field has already 
been fully accounted for in the presented geometry and that x9 = f sin(6,) cos(@;) 
and yo = f sin(6,) sin(@,). The electrical field components in all three directions 
are depicted in Fig. 1.2. Note that the maximum strength of E, is about half of 
the maximum strength of E, and that E, is again a factor of three lower. While 
E,,x points into the same direction over the entire aperture, both the y- and the z- 
component change their signs. As we will see later, this has a direct influence on the 
distribution of the polarization within the focus. In particular, the y- and z-component 
interfere destructively on the optical axes (x = y = 0). 

If we assume that the entrance pupil of the objective is homogeneously illuminated 
and if we use that Ri = Ra = f applies to the principle radii in (1.1), we derive for 
the electric vector field in the focal region 


a2r 


i A “OL 
&éx(r) = -— [ [ves 0, sin 6, [cos 6, + (1 — cos 6s) sin? s} ei krstP Oss) dodos 
00 


a2r 


‚A l 
Ey(r) = En [fv cos Os sin As {(1 — cos As) cos ġs sin ds} elkrstPOs,bs ag. dOs 
= T 
00 


a2r 


‚A l 
Er) = ye [ [ves 9, sin By {sin Oy cos dy} ef krstP 9,8) dh. dO, 
T 
00 


(1.12) 
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where 


_ kfEo 
— 2 


A 


(1.13) 


is a constant and a is the semi-aperture angle of the objective lens. Please note that 
0, is defined in negative z-direction, therefore the sign of the & component has to 
be changed. Usually, the numerical aperture 


NA = n- sin(a) (1.14) 


is specified instead of a to indicate the aperture angle of an objective lens, in which 
n is the refractive index of the immersion medium. 

For an incident plane wave P(6,, &,) is constant (e.g. 0) and the integration with 
respect to &, can be carried out and the analytic solution of £ (r) is [4] 


&,(r) = —i A (lọ + h cos 2¢,) 
&,(r) = —iAh sin 2¢, (1.15) 
&(r) = —2Alı cos ®,, 


where the field is expressed in spherical coordinates r = (r, 6,, &,) and the diffraction 
integrals are defined as 


Io(r) = J J cos 8, sin 8, (1 + cos 0,) Jo(kr sin 6, sin 4,)e'*" 059 608% ag, 
0 

h) = J V cos 6, sin? 0, Jı (kr sin 6, sin 0,)e'*" 059 ©°8 9 gg, (1.16) 
0 


h(r) = f cos 8, sin 8, (1 — cos 0,) J2 (kr sin 6, sin 6, Je!" 09 098% ag, 
0 


and J, are the Bessel functions of the first kind and order n. The overall intensity in 
the vicinity of the focal spot is given by 


IM=&8M+E80+ ER). (1.17) 


The contributions of the electric fields of individual polarization directions to the 
intensity in the focal plane as well as the overall intensity is shown in Fig. 1.3a. It can 
be clearly seen that the symmetry of the polarization direction distribution on the exit 
pupil is transferred to the focal plane. As a consequence, the intensity distribution 
is not rotational symmetric. For example, the focal spot is narrower in the direction 
orthogonal to the polarization direction of the incident field. However, the focus can 
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0.00 0.00 0.0 


Fig.1.3 Contributions ofthe electric fields ofindividual polarization directions and overall intensity 
in the focal plane for a linear and b circular polarized light in the entrance pupil of the objective lens. 
Calculations were performed for an NA 1.4 oil immersion objective lens (A = 640 nm, n = 1.518). 
Scale bar 250 nm 


(a), 


diffraction Gauss 
0.0 m 
A — Gauss 
— diffraction 


T T 
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= 
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Normalized 
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Fig. 1.4 Simulated intensity distribution and Gaussian approximation. The intensity distributions 
in the x-y (left) and x-z (right) plane through the geometric focus a and the corresponding intensity 
profiles along the x-(left) and the z-direction (right) b show the good agreement of the Gaussian 
approximation in the central area. Calculations were performed for an NA 1.4 oil immersion objec- 
tive lens (A = 640 nm, n = 1.518) and circular polarized ligth. Scale bars 250 nm 


be made symmetrical by the use of circular polarized light, Fig. 1.3b. In many cases it 
is not necessary to know the intensity distribution in the focus down to the last detail. 
In this case it is useful to approximate it by a Gaussian function with a corresponding 
full width at half maximum (FWHM). As you can see in Fig. 1.4 this approximation 
is reasonably good in the central area. 


1 STED Nanoscopy 9 


1.1.2 Incoherent Image Formation 


Far-field fluorescence microscopy has proven to be a powerful and versatile tool 
in the life sciences and beyond [7-11]. Since it allows to non-invasively image the 
interior of sufficiently translucent samples in three dimensions, it is well suited for 
imaging biological samples, even under living conditions [12, 13]. Further, tag- 
ging of target proteins or epitopes with fluorescent markers, e.g. by immunolabeling 
with organic fluorophores or by expression of fluorescent fusion proteins, lends an 
exceptional molecular specificity to the method [14-16]. In order to understand the 
implementation of a fluorescence microscope, it is instructive to first consider the 
fluorescence process on the molecular level. 

Figure 1.5a illustrates the relevant molecular energy levels and transitions within 
the singlet state in a Jablonski diagram. Here, Sp denotes the electronic ground state 
and Sı the first excited electronic state. Please note that higher excited states as well as 
the triplet states are neglected here because they are not necessary for a basic under- 
standing. The thick lines indicate the lowest vibrational energy level, whereas the 
thin lines indicate levels with higher vibrational energy. At ambient temperatures, a 
molecule typically resides in the lowest vibrational level of Sp according to the Boltz- 
mann distribution [17]. By absorption of a photon of suitable energy, the molecule 
can be excited to higher vibrational levels of Sı. From there, it relaxes radiation-less 
to the lowest vibrational level, which typically takes place within one picosecond or 
less [18]. The emission of a fluorescence photon takes place, as the molecule spon- 
taneously returns to higher vibrational levels of So. This transition may also occur 
radiation-less via internal conversion. However, for fluorescent molecules, which 
are described here, this process is of minor importance. The time interval which 
the molecule spends in Sı is known as the fluorescence lifetime and depends on 
the molecule itself and on its environment. It is typically several nanoseconds and 
therefore three to four orders of magnitudes longer than the characteristic time for 
vibrational relaxation [18]. The cycle is completed by vibrational relaxation back to 
the lowest vibrational level of So. 

For the design of a fluorescence microscope, two consequences of the described 
excitation and emission process are of immediate relevance. First, due to the extended 
spectrum of vibrational levels, photons within a range of energies may excite the 
molecule. Likewise, fluorescence photons have a spectrum of energies. Second, due 
to the dissipation of energy by the vibrational relaxation after excitation, the emit- 
ted photon’s energy is always lower than that of the absorbed photon. Figure 1.5b 
illustrates these aspects in terms of wavelength instead of energy, and shows the 
absorption and the emission spectrum of a typical fluorescent molecule. Since the 
photon wavelength scales inversely with the photon energy, the emission spectrum 
is at longer wavelengths than the absorption spectrum. This red-shift is called Stokes 
shift [17] and can be harnessed in the implementation of a fluorescence microscope. 

Figure 1.5c shows a simple epi-illumination design of such a microscope. A broad- 
band light source, e. g. a metal-halide lamp or a light emitting diode, is spectrally 
filtered by a bandpass filter (BP), such that the selected wavelength range lies within 
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Fig. 1.5 Principle of fluorescence microscopy. a Jablonski diagram of a fluorescent molecule. 
By absorption of a photon the molecule can be excited from the electronic ground state, So, into 
any vibrational level of Sı. Fast nonradiative relaxation into the lowest level of Sı takes place 
within a picosecond. The molecule can return into any vibrational levels of So by the spontaneous 
emission of a photon (fluorescence). From there it relaxes non-radiatively into its lowest vibrational 
level. b Absorption (blue) and emission (green) spectrum of a fluorescent molecule. The hatched 
areas indicate the transmission range of the respective bandpass filters. c In a typical experimental 
implementation the excitation light is focused into the back aperture of the objective lens in order 
to generate a homogeneous light distribution within the sample. All fluorophores in the sample are 
equally excited (right inset). The fluorescence signal is collected by the objective lens, separated 
from the excitation light by a dichroic mirror and a detection bandpass (BP), and imaged onto an 
area detector such as a CCD camera. The inset on the left depicts the image on the camera (green). 
Note that the positions of the fluorophores are indicated only for illustration purposes 


the absorption spectrum of the respective fluorescent molecule. This excitation light 
is reflected by a dichroic mirror through the objective lens into the sample. In a 
typical implementation it is focused into the back aperture of the objective lens in 
order to illuminate the sample over an extended area (wide-field illumination). All 
fluorescent molecules inside the sample are equally excited and their fluorescence 
is collected by the same objective lens. Since the fluorescence is red-shifted with 
respect to the excitation light, it is transmitted by the dichroic mirror and thus effi- 
ciently separated from the excitation light. After passing through a bandpass filter, 
which blocks residual excitation light and suppresses unwanted room light outside of 
the desired spectral detection window, the fluorescence is imaged by a tube lens onto 
a camera. The transmission ranges of the excitation and detection BP are illustrated 
by hatched regions in Fig. 1.5b. 

The fluorescence light emitted in the sample plane is imaged to the detection plane 
by lenses and is therefore subject to diffraction. The image of a fluorescent molecule, 
which can be seen as a point emitter of electromagnetic radiation, is therefore not a 
point, but spread to an extended intensity distribution (cf. Sect. 1.1.1). The projection 
of this pattern into the sample is called the point spread function (PSF) and it is an 
important characteristic of a microscope, as it determines its resolution capability 
(cf. Sect. 1.1.3). Since fluorescence emission is a spontaneous process, emitted light 
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of different sources has no constant phase relation. The light is incoherent and thus 
the image of several point sources (e.g. a fluorophore distribution in the sample) 
is composed of single overlapping PSFs. In mathematical terms, the image [image. 
which is back-projected into the sample plane, is the convolution of the true object 
O and the PSF h: 

Image (r) = O(r) *h(r) (1.18) 


1.1.3 Classical Resolution Limit 


The resolution of an optical system, e.g. a microscope, describes its ability to distin- 
guish two objects. Therefore, the spatial resolution of a microscope is given by the 
minimum distance of two structures at which their images can be discerned. 


The Abbe Limit 


By investigating line gratings, Ernst Abbe discovered in 1873 that the lateral reso- 
lution of a light microscope solely depends on the wavelength of the light and the 
numerical aperture of the objective lens used [19]. In order to resolve two adjacent 
lines they have to be separated by at least: 


Ao 


Amin = INA’ 


(1.19) 


with Ao being the vacuum wavelength of the light used. This fundamental limit 
is often referred to as the diffraction limit. However, Abbe’s considerations do not 
allow conclusions on light-emitting objects or the axial resolution of the microscope. 


The Rayleigh Criterion 
As already described in Sect. 1.1.1, the image of a point is not a point but a blurred 


spot. If we assume an incident plane wave and neglect the direction change of the 
polarization of the electric field by the focusing process, formula 1.12 simplifies to 


a 2r 
A 
Er) = &(r) = sA J f ‚cos 6, sin bi’ 49,do,. (1.20) 
00 


In this case the integration with respect to ¢, can be readily performed and we 
derive (compare (1.16)) 


Er) = -iA f cos 6, sin 6, Jo(kr sin 0, sin 0, Jet” 2S  ©8 % ag, (1.21) 
0 
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Fig. 1.6 Images of two point emitters with different distances and corresponding intensity pro- 
files along the dashed white lines. a For large separations both emitters can be easily identified. 
b According to the Rayleigh criterion, the minimum distance to resolve both emitters is reached, 
when the maximum of one emitter coincides with the minimum of the other. e Emitters that are 
closer than this distance cannot be resolved in the image 


Note that r sin(0,) is the distance of r from the optical axes and r cos(0,) corre- 
sponds to the z-coordinate of r. If we concentrate on the focal plane (z = 0) or the 
optical axis (x = y = 0) and assume that the aperture angle is relatively low (parax- 
ial approximation), the integration with respect to 0, can also be performed and we 


obtain 
aa ky x? 
Gt 1(ky x? + y? sina) 
ky x? + y? sina 
sin(z sin’ a) 


Ro wine 
77 sin a 


(1.22) 
&(0,0,z) = - 


The absolute square of £ (x, y, 0) results in the so-called Airy pattern in the focal 
plane. Figure 1.6 shows the images of two point-like emitters for three different dis- 
tances and the corresponding intensity profiles along the white dashed lines. Dashed 
blue and yellow curves indicate the profiles for the individual emitters, whereas the 
red lines show the profile when both emitters are radiating at the same time. When 
the distance of both emitters is sufficiently large, they can easily be identified as indi- 
vidual emitters. According to the Rayleigh criterion, two spatially separated point 
sources can be discerned, when the maximum of the diffraction pattern of one point 
emitter coincides with the first minimum of the other [6, 20]. This case is illustrated 
in Fig. 1.6b. Note that the Rayleigh criterion only holds true for incoherently radi- 
ating sources. When the distance between the emitters gets smaller they cannot be 
distinguished any more, Fig. 1.6c. 
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The distance between the main maximum and the first minimum of the Airy 
pattern is often used as a measure for the lateral resolution and is given by: 


dzy = 0.61 —. (1.23) 
Likewise, the resolution in axial direction (z) can be defined as the distance 


between the main maximum and the first minimum in z-direction as: 


n Xo 


d, = 2.00 a 
(NA) 


(1.24) 


The Full Width at Half Maximum Criterion 


As you can see from Fig. 1.6b, two points with a distance corresponding to the 
Rayleigh criterion can still be resolved if the signal to noise ratio is sufficiently high. 
This is no longer the case when their distance corresponds to the FWHM of the Airy 
pattern, Fig. 1.6c. If the resolution is defined in such a way we get 


Xo 
dzy =0.51 NA and (1.25) 
n Ao 
d, = 1.77 7: (1.26) 
(NA) 


Thus, the resolution of a microscope is according to this criterion limited to 
approximately half the wavelength in lateral and twice the wavelength in the axial 
direction. The FWHM definition of the resolution is particularly well suited if the 
PSF is approximated by a Gaussian function which, as is well known, has no minima. 
In this case the resolution then can either be expressed by its standard deviation, o, 
or its full width at half maximum (FWHM = 24/21n2¢). 


1.1.4 Confocal Microscopy 


Wide-field fluorescence microscopy offers the possibility to image the entire field 
of view of the objective lens at once. However, wide-field illumination also has a 
decisive disadvantage as it not only excites dye molecules in the focal plane, but 
simultaneously in the entire sample volume. The light emitted by axially distant 
molecules is detected in addition to the signal from the fluorophores in the focal 
plane and generates a bright background in the image. This makes it difficult to 
acquire high quality data, especially in axially extended samples. 

In confocal microscopy [21], the axial extent of the sample region from which the 
signal impinges on the detector can be narrowed down. For excitation, a point-like 
light source is imaged into the sample plane. Since conventional fluorescent lamps 
are spatially extended, their light has to be focused onto a pinhole and afterwards 
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Fig. 1.7 Principle of a confocal microscope. a In a typical experimental implementation a colli- 
mated excitation laser is focused into the sample by the objective lens, creating a diffraction-limited 
excitation spot. Fluorophores within this spot will be excited with a probability proportional to the 
excitation light intensity. The fluorescence signal is collected by the objective lens, separated from 
the excitation light by a dichroic mirror and a detection BP, and imaged onto a detector. A confocal 
pinhole in front of the detector ensures that only fluorescence from a certain region is detected. The 
inset on the right depicts the excitation spot and the accordingly excited fluorophores in the sample 
plane, while the inset on the left depicts the image of the excited molecules in the pinhole plane. 
The gray circle indicates the detection pinhole. Only fluorescence within the circle is detected. b 
Additionally, fluorescence from planes distant to the focal plane is blocked by the detection pinhole. 
This allows to analyze signal from thin optical sections. Note that the positions of the fluorophores 
are indicated only for illustration purposes 


collimated with a lens. The invention of lasers as bright and intense point-like light 
sources rendered the use of an excitation pinhole obsolete and quickly led to the first 
confocal laser scanning microscopes [22, 23]. 

The main components of a typical confocal fluorescence microscope are illustrated 
in Fig. 1.7. A collimated excitation beam is focused into the sample by an objective 
lens and generates a diffraction-limited excitation PSF, hex (r). The inset on the right 
side in Fig. 1.7a depicts the focal plane and the excitation spot. Fluorescent markers 
within Aex (r) will be excited with a probability proportional to the light intensity 
and can therefore emit fluorescence. In order to acquire an image, either the sample 
must be scanned through the focus, or vice versa. 

Each excited fluorophore can emit fluorescence, which is collected by the objective 
lens, separated from the excitation light by a dichroic mirror and a detection BP and 
imaged onto a pinhole. The pinhole ensures that only fluorescence from the direct 
vicinity of the geometrical focal point is detected. As the light path is invertible this 
can be interpreted as imaging the pinhole into the focal plane. This image is called 


1 STED Nanoscopy 15 


the detection PSF, haet (r), and describes the probability to detect a photon emitted 
at position r. The gray circle in the left inset in Fig. 1.7a indicates the pinhole. Only 
fluorescence originating from inside this circle is detected with high probability. Note 
that each fluorescent molecule is imaged diffraction-limited. 

Another advantage of the detection pinhole is illustrated in Fig. 1.7b. The fluo- 
rescence from axial distant planes (with respect to the focal plane) is blocked by 
the detection pinhole. Therefore, the key feature of the confocal microscope, other 
than conventional microscopes, is that it efficiently (and sharply) images only those 
regions of a volume sample that lie within a thin section around the focal plane of 
the microscope. In other words, it is able to reject (effectively attenuate) light from 
out-of-focus regions of the sample [24-28]. 

The PSF of the confocal microscope is given by the probability that a fluorophore 
is excited multiplied with the probability that its fluorescence is detected: 


heonf (r) = hex (r) . haet (r) : (1.27) 


In the theoretical limit of an infinitesimally small detection pinhole and identical 
wavelengths for illumination Aex and detection Ager, a confocal microscope improves 
the resolution by a factor of V2 [24]. 

The influence of the size of the detection pinhole on the lateral (black line) and 
axial (blue line) resolution, as well as on the detected signal (red line) is shown in 
Fig. 1.8. The graphs are obtained by calculating and analyzing images of a point emit- 
ter imaged with an oil-immersion objective lens (NA = 1.4, n = 1.518, Aex = 640 nm, 
Adet = 680 nm) using (1.17) and varying pinhole diameters. The pinhole diameter is 
measured in Airy units (AU), with one AU corresponding to the diameter of the Airy 
disc in the focal plane (LAU = 1.22 A/NA). It is clearly visible that the best achiev- 
able resolution in all directions is achieved with an infinitesimally small pinhole. 
With increasing pinhole size, the resolution of the confocal microscope decreases. 
The detected signal, however, grows with increasing pinhole diameter [29]. For 
experimental purposes, a finite pinhole size is necessary to collect sufficient signal. 
Often a pinhole size in the range of 1 AU is chosen as a tradeoff between collected 
signal and resolution. Even though the resolution increase in the lateral direction is 
almost negligible in this regime, the advantage of optical sectioning remains. 

For a circular detection pinhole, the pinhole function is given by 


1 foryx? + y? < po (1.28) 


r)= p(x, y,z =0)= 
ai ) 0 otherwise 


with po being the pinhole radius. The real detection PSF, /det, real (r), is then given 
by the convolution of haet (r) with the pinhole function: 


haet, real (r) = haet (r) x p (r) . (1.29) 
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Fig. 1.8 Influence of the pinhole diameter on the lateral (black) and the axial (blue) resolution as 
well as the detected signal (red). Calculations are performed for an NA 1.4 oil-immersion objective 
lens (Aex = 640 nm, Adet = 680 nm, n = 1.518). Increasing the pinhole size increases the detected 
signal, but also lowers the achievable resolution. Often pinholes with a size of 1 AU are utilized 
in a confocal microscope, as at this size sufficient signal is collected while the optical sectioning 
capability is mainly maintained 


1.2 Fundamentals of STED Microscopy 


For a long time, the resolution of a microscope was considered to be limited by 
diffraction. But during the last decades, physico-optical methods that circumvent 
the diffraction barrier emerged in far-field fluorescence microscopy [30]. These new 
super-resolution microscopy - in short ‘nanoscopy’—methods have been awarded 
the Nobel prize in Chemistry in 2014 and allow a resolution improvement of at 
least one order of magnitude. The first method of this kind was stimulated emission 
depletion (STED) microscopy, proposed in 1994 by Hell and Wichmann [31] and 
demonstrated by Klar and Hell in 1999 [32]. 

Ever since their advent, super-resolution microscopy techniques are versatile tools 
for non-invasive investigations of structures. STED microscopes offer for exam- 
ple the possibility to measure intracellular structures in fixed [33, 34] and living 
cells [35, 36] with, in principle, unlimited resolution [31]. A lateral resolution of 
15nm was demonstrated by imaging single fluorescent molecules [37], and a reso- 
lution of 5.8 nm [38] resp. 2.4 nm [39] have been demonstrated on single nitrogen 
vacancy centres in diamonds. Furthermore, STED microscopes have been used to 
measure e.g. colloidal structures [40], and block copolymers [41, 42] and the under- 
lying principle has been used for STED lithography [43, 44]. As STED microscopy is 
cutting edge technology, new, improved acquisition schemes are continuously being 
developed and integrated (e.g. RESCue-STED [45] or DyMIN [46]). 
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1.2.1 Basic Idea 


A fundamental breakthrough in the achievable resolution of light microscopes was 
realized when fluorescent markers were not only considered as contrast agents, but 
the molecular transitions of the markers were additionally used to specifically switch 
on and off the ability of a subset of markers to fluoresce. Hereby, the fluorescence 
from markers within a diffraction-limited spot can be temporally separated and thus 
be read out sequentially. 

As detailed above, confocal microscopy employs a targeted readout scheme. The 
diffraction-limited focus is scanned through the sample and the detected fluorescence 
is computationally assigned to the known position, thereby generating an image pixel 
by pixel. In this mode, increasing the resolution is synonymous to decreasing the spa- 
tial extent of the region from where the fluorescence is detected. STED microscopy 
realizes this by employing the process of stimulated emission to actively switch off 
fluorescent markers by forcing them to the electronic ground state So without emis- 
sion of a fluorescence photon (Fig. 1.9a). This can be achieved by overlapping the 
excitation spot with a spatially extended intensity distribution, Z (r, t), featuring at 
least one zero-intensity region as off-switching requires / (r,t) > 0 and is absent 
for J (r, t) = 0. If the STED focus has a ring shape (doughnut shape) with a central 
intensity zero, molecules at its rim are switched off, while molecules in the center 
are not. This results in a spatial narrowing of the fluorescent spot, whose extent then 
defines the resolution of the microscope. The resolution, which theoretically can get 
arbitrarily good, depends not only on the applied STED intensity, but also on the pho- 
tophysical properties of the fluorophores. A detailed discussion of the photophysics 
of dye molecules is presented in Sect. 1.2.2. 

The key components of a STED microscope are illustrated in Fig. 1.9c. The setup 
is based on a confocal microscope (cf. Figure 1.7). Additionally, a STED laser, whose 
wavelength is at the red end of the fluorescence spectrum (cf. Fig. 1.9b), e.g. 4° = 
654 nm and Asrep = 775 nm for Abberior STAR 635P, is phase-modulated and 
superimposed with the excitation laser. Further detail of how to shape the STED 
beam is given in Sect. 1.2.3. The emitted fluorescence is spectrally separated from 
the laser beams and detected by a point detector (e. g. a single photon counting 
module). 

The right inset in Fig. 1.9c depicts the overlap of the excitation and STED beams 
in the sample plane. Only fluorescent molecules in the central region of the depletion 
pattern are allowed to remain in the excited state and can therefore emit fluorescence 
and contribute to the detected signal. The inset on the left side depicts the image 
plane with the gray circle indicating the detection pinhole. Usually, pulsed lasers are 
used as light sources for excitation and depletion in STED microscopy. The central 
inset indicates that a temporal delay between the excitation and depletion pulses is 
needed for an efficient fluorescence suppression (cf. section 1.2.2). Additionally, a 
helical phase mask, that is used to create the doughnut-shaped depletion pattern by 
imprinting a phase retardation from 0 — 27 onto the STED beam, is depicted. 
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Fig. 1.9 Principle of a STED microscope. a Jablonski diagram of a fluorescent molecule. In addition 
to the processes of excitation and spontaneous emission, stimulated emission is now used to switch 
off excited molecules in a targeted way. b Absorption and emission spectrum of a fluorescent 
molecule. The depletion laser is shifted to the far right of the emission spectrum of the fluorophore. 
c In comparison to the confocal microscope, an additional depletion laser is now superimposed with 
the excitation beam. A helical phase mask imprints a phase retardation from 0 — 27 onto the STED 
beam, that when imaged into the sample plane creates a doughnut-shaped depletion pattern. The right 
inset shows the overlap of excitation and STED beam in the focal plane. Wherever the STED intensity 
is sufficiently high, excited fluorophores are driven into their off-state. Therefore, fluorescence is 
only emitted from sample regions where the STED intensity is negligible. This fluorescence is 
separated from the laser light and imaged onto a point-detector. Most STED microscopes utilize 
pulsed lasers for excitation and depletion. The central inset illustrates, that a time delay between 
the excitation and STED pulses is needed for an effective suppression of the fluorescence 


1.2.2 Basic Photophysics of Dye Molecules 


As described in the previous section, the key principle of STED microscopy is the 
inhibition of fluorescence emission by stimulated emission. The efficiency of this 
fluorescence depletion is a crucial parameter for the performance of a STED micro- 
scope and it depends on the interplay of the excitation light and the STED light with 
the fluorescent molecules. In the following, this will be discussed in detail with spe- 
cial attention to the timing between excitation and STED light and to the required 
STED power. Following [47], rate equations for the population of electronic states 
of fluorescent molecules will be formulated and their implications will be discussed. 

In the context of STED microscopy, a fluorophore can be modelled as a simple four 
level system, in which photo-bleaching, intermediate dark states and radiation-less 
decay from S; to So are neglected (cf. Fig. 1. 10a). Note that in comparison to Fig. 1.9a, 
the spectrum of higher vibrational levels is merged to one level each and transition 
rates k and population probabilities N have been introduced. Specifically, N, and N3 
correspond to the population of the lowest vibrational level of Sp and S4, respectively. 
N4 and N» represent the population of the higher vibrational states Sj yin and So,vib 
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Fig. 1.10 Four level system of a fluorophore and laser pulse timing used in numeric calculations. 
a A fluorescent molecule can be described as a four level system with So and So,vib being the lowest, 
respectively a higher vibrational level of the electronic ground state. Sı and S1 „ip are the correspond- 
ing levels in the first excitated state. Nı to N4 denote the respective population probabilities. The 
straight arrows indicate the excitation (blue) with its rate constant kex, fluorescence emission (green, 
kn) and stimulated emission (red, kstep), while the wiggly arrows denote vibrational relaxation 
(kyib2 and kyjp4). b Gaussian-shaped excitation pulse (blue) and STED pulse (red) which exhibit 
their maximum intensities at time points fo,ex and fo,step. The delay between the pulses is At 


after excitation and fluorescence emission, respectively. Since N;,i = 1, 2,3, 4 are 
probabilites, $; N; = 1. 

The temporal evolution of the population probabilities N, to N4 can be described 
by a set of coupled rate equations: 


= = kex [Na(t) — Ni(t)] + KyidaNa(t) 

ON 
Ra = ka N3 (t) — ksrten [N2 (t) — N3(t)] — kvio2 N2 (t) 

Rn (1.30) 
7 = —ka N3 (t) + ksteo [N2 (t) — N3 (t)] + kvina N4 (t) 

O = —kex [N4 (t) — N1 (t)] — kvivg Na) 


Here, ka, kvib2 and kyip4 are the rate constants for fluorescence decay from Sı and 
vibrational decay from So,vib and S1 yin, respectively. The rates for these spontaneous 
processes are given by the inverse of the lifetimes of the starting states, with kg ‘= Ta 
in the range of several nanoseconds and Ka, = Tyip on the order of one picosecond or 
less [18]. Note that excitation from So to S1 yi, by the STED light has been neglected. 

The rate constants for excitation kex and stimulated emission kstgp, however, 
depend on the intensity of the excitation and the STED light. They are given by the 
product of the molecular cross-section o for the respective transition and the light 
intensity Z divided by the photon energy hc/ Ao: 


k= ol 
7 he/o 


(1.31) 
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with the Planck constant h, the speed of light c and the vacuum wavelength Ao. 
Please note that in order to make the notation easier to read, the indices ex and step 
are omitted here and in the following and are introduced again later. 

When considering the third line in (1.30), it becomes obvious that for an efficient 
depletion of fluorescence, the depopulation of S; by stimulated emission must not 
only dominate over the spontaneous fluorescence emission, but also over the refilling 
of Sı from Sı vip caused by vibrational relaxation after excitation. This suggests that 
a pulsed scheme, which has already been implied in Fig. 1.9b, is beneficial [48]. An 
excitation pulse is followed by a STED pulse. This separates excitation and stimu- 
lated emission temporally, such that S; is not refilled during fluorescence depletion. 
Further, pulsed lasers typically provide a high peak intensity, while the average laser 
power and thus the light dose in the sample is kept rather low. 

For modelling the pulsed scheme, the intensity-dependent rate constants kex and 
kstep in the rate equations (1.30) need to be formulated time-dependently. For this, 
the laser pulses are assumed to have a Gaussian shape in time (cf. Fig. 1.10b) and 
the time-dependent intensity I (t) is 


I(t) ge Aln2 -4in20-1) 
— — e TŻ 
Ao V TT? 


(1.32) 


with the photon fluence per pulse J (measured in number of photons per area per 
pulse), the temporal FWHM 7 and pulse center position to. 

Usually, in the experiment, the fluence per pulse in the focal plane cannot be 
measured directly. Instead, the laser power P is readily accessible, which is why the 
fluence J will now be expressed in terms of power P. The total number of photons 


per laser pulse n is given by 


z (1.33) 
n= — : 
Trephc/ Xo 


with the repetition rate of the laser pulses r,-) and the photon energy in the denom- 
inator. The distribution of photon fluences in the focal plane J(x, y) is then given 
by 

J(x, y) =nh(x, y) (1.34) 


with the focal probability distribution of a single photon h. Please note that in contrast 
to the previous notation, here the PSF h is not interpreted as an intensity distribution, 
but as the probability for a photon to be found at a certain position. Therefore, h is 
normalized such that f°, h(x, y)dxdy = 1. 

Combining (1.31), (1.32), (1.33) and (1.34) gives the time and position dependent 


rate constant 
k( 1) P 41n2 Zn ) (1.35) 
x,y, t) = 0 —— e r2 x, . 
= KephtlAg V nT? 4 
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with a molecule dependent, a laser light dependent and a microscope dependent part. 
Due to practical reasons, we simplify this expression further by approximating the 
PSF with a Gaussian function with FWHM d, y ~ a (see Sect. 1.1.3 and (1.25)) 


Ain? = 41n2(x?-+y?) 
h(x, y) > diy 1.36 
(x, y) na? e (1.36) 
and evaluate it at the geometric focus position 
P -41n Xt-10,)? 
k(0,0,1), 23.32; ——e 7 NA? (1.37) 


Freph CA; Ti; 


where i € {ex, STED}. Note that here the indices ex and srep are introduced again. 
This equation depends on experimental parameters, which are easy to obtain, either 
by direct measurements or by consulting data sheets. Substituting this expression 
into the rate equations (1.30), we obtain the means to analyze the time-dependent 
state population of a fluorescent molecule in the pulsed STED scheme. A quantity 
of particular interest is the overall emitted fluorescence 


F= T kqN3(t)dt (1.38) 
0 


and its dependence on experimental parameters, since the STED microscope’s per- 
formance is directly influenced by the efficiency of fluorescence depletion. 


Influence of Laser Parameters on Fluorescence Depletion 


For successful STED imaging in a pulsed scheme, it is particularly important to 
consider the influences of the relative timing between the laser pulses and the STED 
laser power on the efficiency of fluorescence depletion, since these two parameters 
need to be routinely set by the microscopist. Therefore, the overall emitted fluo- 
rescence (1.38) is simulated by numerically solving the rate equations (1.30). The 
rate constants kex and ksrep are assumed to be time-dependent and are analyzed at 
position (x,y) = (0,0) according to (1.37). From an experimental point of view, this 
corresponds to measuring the fluorescence from a very small bead which is located in 
the very center of the superimposed focal spots of the excitation and the (not spatially 
shaped) STED light. 

For the simulations, fluorophore parameters are set to mimic a typical STED 
fluorophore: 7 = 3.3 ns, Tyin2 = Tyiva = l PS, Cex = 4.6 - 10-16 cm?, ostep = 
4.6 - 10-17 cm?. Note that effects due to the polarization and the orientation of the 
transition molecular dipole are neglected. The NA of the objective lens is assumed 
to be 1.4. The laser wavelengths are set to Aex = 640 nm and Asrep = 775 nm, 
which are typical for STED imaging of red fluorophores, and the laser repetition rate 
is assumed to be rep = 20 MHz. 
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Fig. 1.11 Influence of pulse delay Ar a and STED power PsTep b on the relative fluorescence. a The 
relative fluorescence shows a pronounced minimum at Ar = 440 ps. Calculations were performed 
for Pstep = 0.25 mW. b For optimized pulse delay Ar = 440 ps, the relative fluorescence drops to 
half at Pstep = 0.09 mW, which is indicated by the red line. All other parameters are as mentioned 
in the main text 


The question of suitable laser pulse lenghts deserves a short comment: While a 
very short excitation pulse in the range of a picosecond is beneficial, because fluores- 
cence decay during excitation can be neglected in this case, there is a clear constraint 
on the shortest feasible pulse length ofthe STED laser. The rate for stimulated emis- 
sion from the S; to Soxib is equal to the rate for re-excitation from So,vib back to $1. 
Therefore, at best, an equal population of both states can be achieved, unless there is 
sufficient time for vibrational relaxation from So vip to So. Only due to this drain of 
So,vib, the state Sı can be efficiently depleted. The STED pulse length should there- 
fore be much longer than the vibrational lifetime [48]. On the other hand, it should be 
shorter than the fluorescence lifetime since STED photons arriving after the molecule 
has already fluoresced do not have any effect and are therefore wasted. Consider- 
ing these aspects as well as specifications of commercially available laser systems, 
excitation and STED pulse lengths are set to Tex = 50 ps and Tstgp = 800 ps. 

The results of the simulations are shown in Fig. 1.11. It illustrates the relative 
fluorescence 7, which depicts the amount of remaining fluorescence, when STED 
light is applied compared to the case without applying any STED light. The STED 
power is Psrep = 0.25 mW and P., = 10 uW is chosen such that no saturation 
effects occur during excitation. 

In Fig. 1.1 1a the relative timing At = fo,step — fo,ex of the excitation and STED 
pulse is varied. This so called pulse delay spans a range from —1.5 ns to 12.5 ns, 
where a positive value corresponds to the situation where the STED pulse peak 
reaches the sample after the peak of the excitation pulse. If Ar is too short, the STED 
efficiency is low, either because STED photons reach the sample even before the 
molecules have been excited or because they have not yet vibrationally relaxed to 
the lowest level of $1. This effect accounts for the steep slope on the left hand side, 
whose gradient is determined by Tstep. If, however, the pulse delay is too long and 
the STED pulse reaches the sample too late, some of the molecules will have already 
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fluoresced. The gradient of the right slope therefore depends on Ta. At optimal time 
delay the relative fluorescence is minimal, which is the case for At = 440 ps in this 
example. 

Figure 1.11b shows the relative fluorescence at optimal time delay as a function of 
Pstep. The STED power at which the fluorescence drops to half is called saturation 
power Psat. The shape of the curve has strong similarity to an exponential decay and, 
indeed, if re-excitation of the dye by the STED light is neglected (simple two-level 
system), 77 is given by [49] 

n= e ISTED JsteD | (1.39) 


Substituting Jstep using (1.34) and (1.33) gives the focal shape of the fluorescence 
suppression induced by the applied STED light 


7 
(ax, y) = e "SD Rape Asrap sten OY) (1.40) 
again with adye dependent, a laser light dependent and a microscope dependent part. 

Evaluating the relative fluorescence n at the center of the PSF and setting it to 1/2 
Neat (0,0) 23 


PSTED 
7(0, 0) = e "SED irephc/\s TED 
’ 


1/2 (1.41) 
results in an analytical expression for the saturation power Psat 


In2/ostEp 
Psat = —————Nephe/r f 1.42 
= 70,0) Trep / ASTED (1.42) 


Note that a calibration PSF hea is introduced here in order to make the expression also 
applicable for more elaborated shapes of Astea, €. g. exhibiting a central intensity zero. 
In practice, the thus defined Psat defines the power of the STED light which is needed 
to suppress the fluorescence at the center of a Gaussian-shaped STED PSF by half. It 
depends on the optical properties of the microscope, photophysical properties of the 
dye and parameters of the STED laser and allows to write the relative fluorescence 
(cf. 1.40) in a particularly simple form 


— In 2¢ "STEDC») 


n(x, y) =e heal 0), (1.43) 


with the saturation factor Ç defined as Pstep/ Psat- 


1.2.3 Shaping the STED Beam 


As already mentioned in Sect. 1.2.1, STED nanoscopy is based on the idea of limiting 
the ability of molecules to fluoresce in the immediate vicinity of the geometric 
focus. Since the fluorescence is inhibited by the process of stimulated emission, it is 
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necessary to shape the intensity distribution of the STED light such that it is zero in 
the geometric focus. In order to utilize the maximum power of the available STED 
light, the phase distribution of the electric field at the entrance pupil of the objective 
lens P (xo, yo) has to be designed such that the contributions of all secondary plane 
waves E, interfere destructively at the focus. 


Central Retardation 


Among the simplest ways to generate a central zero intensity is the generation of a 
phase delay of 7 in a central circular region of the aperture [50] 


T x2 + 2 <r 
Pan = | Veen (1.44) 


0 elsewhere. 


If the effect of focusing on the polarization direction of E, is neglected, ro is 
exactly given by the diameter of the entrance pupil divided by /2. For the usually 
utilized high NA lenses, however, it is slightly smaller (compare (1.11) and Fig. 1.2). 
As shown in Fig. 1.12a, the polarization contributions of the secondary plane waves 
in the direction of the original polarization cancel each other out. The orthogonal 
polarization directions also interfere destructively, since the phase mask does not 
change the rotary symmetry of the corresponding field components on the exit aper- 
ture (compare Figs. 1.2 and 1.3). The intensity distribution generated by illuminating 
the phase mask with circular polarized light is shown in Fig. 1.12b. Since the phase of 
secondary plane waves, originating from the central region of the aperture, changes 
significantly faster as a function of z than that of secondary plane waves from the 
boundary region, spots of high constructive interference occur above and below the 
focal plane. Therefore, this phase mask is usually used to increase the resolution in 
the axial direction. 


Helical Retardation 


Another way to create a depletion pattern is to helicaly phase retard the STED 
beam [51] 
P(xo, yo) = (1.45) 


where ¢ is the angle between the vector (xp, yo) and the x-axis. The operation princi- 
ple of this phase mask is based on the same effect, which ensures that when focusing 
a plane x-polarized wavefront, the y- and z-components of the electric field van- 
ish on the optical axis (compare Figs. 1.2 and 1.3). Since two mirror-symmetrical 
points with respect to the optical axis always exhibit a phase difference of 7, their 
x- and y-components of the electric field cancel each other out at the geometric focus 
(Fig. 1.13a). However, this also creates the effect that the z-components of E, for 
these points face in the same direction, which means that they interfere constructively 
at the focal spot. However, this can be avoided by using circular polarized light. For 
the originally x-polarized part of the illuminating field, the effect still exists, but 
now the z-components of the originally y-polarized part for two points which are 


1 STED Nanoscopy 25 


z 


0.50 
0.25 
0.00 


Fig. 1.12 Central phase retardation. a A phase delay of the central region of the aperture by 7 (see 
inset in the top right corner) causes the x-components of the secondary plane waves of the central 
and outer regions to cancel each other out. The illustration shows E, for two opposing points in the 
inner and outer region when illuminated with x-polarized light. b Strength of the lateral and axial 
electric field components and overall intensity in the vicinity of the focal spot for illumination with 
circular polarized light. Calculations were performed for an NA 1.4 oil immersion objective lens 
(A = 775 nm, n = 1.518). Scale bars 250 nm 


rotated by ¢ = 90° with respect to the originally considered points face in the oppo- 
site direction. This causes the z-components of the electric field of the two point 
pairs to cancel each other out (Fig. 1.13a). Note that this effect is only achieved if the 
circularity of the light matches the rotation direction of the helical phase mask. If this 
is not the case, the described effect contradicts and the field distribution has maxi- 
mum z-component in the geometrical focus. The intensity distribution for a correct 
circularity of the illuminating light field is depicted in Fig. 1.13b and forms a hollow 
cylinder around the optical axes. It has been shown that helical phase retardation 
generates the optimal inhibition pattern for isotropic resolution enhancement in the 
focal plane [51]. 


1.2.4 Resolution 


In this section the effective PSF of a STED microscope is derived. It describes the 
volume in which fluorescence is still allowed, and whose spatial extent is a measure 
for the resolution. As an example, a 2D STED microscope utilizing a helical phase 
mask is considered. 

We assume that the excitation and STED light is applied as temporally separated 
pulses with a pulse duration much shorter than the fluorescence lifetime. Photo- 
bleaching, intermediate dark-states or re-excitation of the dye by the STED light 
are neglected (simple two-level model) and dye molecules are assumed to rotate 
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Fig. 1.13 Helical phase retardation. a A helical phase retardation (see inset in the top right corner) 
causes the lateral field components of the secondary plane waves of opposing points to cancel each 
other out. The illustration shows E, for two opposing points when illuminated with x-polarized 
(red) and y-polarized (blue) light. The two point pairs are rotated by 90° with respect to each other. 
The phase delay between the x-polarized and y-polarized light was set to 7/2. b Strength of the 
lateral and axial electric field components and overall intensity in the vicinity of the focal spot for 
illumination with circular polarized light. Calculations were performed for an NA 1.4 oil immersion 
objective lens (A = 775 nm, n = 1.518). Scale bars 250 nm 


fast enough to average the orientation of their molecular transition dipole relative 
to the polarization of the excitation and STED light. Under these conditions, the 
effective PSF of the STED microscope herr is the product of the excitation PSF and 
the remaining fluorescence in the presence of the STED light [49] 


hex, y) = hex, yn, y). (1.46) 


According to Sect. 1.1.3, the excitation PSF can well be approximated by a sym- 
metrical 2D Gaussian peak in the focal plane with a FWHM of dx y ~ a 


41n2(x?-+y?) 


hex(x,y)xe ® (1.47) 


where a normalization constant has been neglected. 

For sufficiently large saturation factors, the FWHM of the effective central spot 
of the STED microscope is much smaller than the wavelengths used and only the 
shape of the STED intensity distribution in the vicinity of the focal spot determines 
the shape of the central spot. In this region, the focal distribution Asrep, which is 
generated via helical phase retardation (cf. Sect. 1.2.3), can be well approximated 
by a 2D parabola [52] 

hsrep(, y) 


Er 2 2 
10.0) ~ da(x+y”). (1.48) 
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Fig. 1.14 Pattern steepness and resolution in the case of helical a and central b phase retardation. 
Top: Focal intensity distribution in the x-y-plane and the x-z-plane, respectively, through the geo- 
metric focus. Center: The intensity profiles (black) along the dotted white lines can be well fitted 
with a parabola (red) in the vicinity of the minimum in both cases. Bottom: With the fitted pattern 
steepnesses and the indicated FWHM of the excitation PSF in the respective direction, the lateral 
and axial resolution can be calculated according to (1.50). Calculations were performed for a 1.4 
NA oil-immersion objective lens (n = 1.518), Aex = 640 nm and Astep = 775 nm 


Here, Acai (0, 0) is the calibration factor already known from Sect. 1.2.2 and a is 
the so called pattern steepness, which is proportional to the curvature of Asrep in 
the geometrical focus. Figure 1.14a shows the 2D STED intensity distribution (top) 
(cf. Fig. 1.13b) and the good agreement of the parabolic fit (center). Please note 
that the definition of the pattern steepness differs from a prior definition. Here, it is 
normalized to hea (0, 0), while Harke et al. normalized a to the maximal intensity 
of hstep (x, y) in the focal plane [52]. 

Combining (1.46), (1.47), (1.48) with (1.43) from Sect. 1.2.2 for n(x, y) gives a 
relatively simple expression for the effective STED PSF herr, which represents a 2D 
Gaussian peak shape 


hese (x, y) = ont ln2(x2-+y?) (zo +a) (1.49) 
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Its FWHM along the lateral direction is 


dx 
u (1.50) 


[1+ a2yac 


For sufficiently large saturation factors, the attainable resolution of the STED 
microscope is only governed by the product of the pattern steepness and the saturation 
factor 


dstep = 


1 
dstep = Vat (1.51) 


The dependence of the lateral STED resolution on the saturation factor ¢ according 
to (1.50) is shown in Fig. 1.14a (bottom). In this example, which was calculated for an 
NA 1.4 oil immersion objective lens (n = 1.518, Aex = 640 nm, Asten = 775 nm), 
a resolution of 50nm, which corresponds to a resolution improvement of factor 5, is 
achieved for a saturation factor Ç ~ 28. 

The resolution formula is not limited to the 2D STED pattern considered here, 
but is applicable whenever a parabolic fit can be reasonably applied in the vicinity of 
the zero intensity spot. This specifically also applies to the STED pattern, which is 
usually used for axial resolution increase (cf. Sect. 1.2.3 and Fig. 1.12b). Figure 1.14b 
shows the good agreement of the fit to the corresponding focal intensity distribution 
along the axial direction and presents the attainable axial resolution. Again an oil 
immersion objective lens with NA = 1.4 was assumed (Aex = 640 nm, Astep = 
775 nm). Because of the larger FWHM and the smaller pattern steepness, a saturation 
factor of Ç = 28 yields a resolution of only 103 nm in this case. Still this corresponds 
to a resolution increase by a factor of 6. 

For imaging three-dimensional structures, a resolution increase in all three dimen- 
sions is often desired. This can be achieved by an incoherent superposition of both 
STED patterns. It was shown that a distribution of the total available power of 30% 
in the 2D and 70% in the axial pattern is favorable in terms of focal volume size and 
axial resolution [40]. 


1.3 Imaging Examples 


STED microscopy has become an indispensable tool in the life sciences, as it allows 
non-invasive uncovering of details hidden to conventional light microscopes. By now, 
nanoscopy has been successfully applied to various fields such as immunology, sig- 
naling, virology, bacteriology and cancer biology [53]. Particularly, the possibility to 
label different types of proteins simultaneously and to record their relative spatial dis- 
tribution at super-resolution offers important insight into protein co-localization and 
interaction. In order to demonstrate the current performance of STED microscopy, 
some selected examples of cell imaging are presented in the following. 
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Fig. 1.15 Confocal and RESCue STED images of nuclear pore complex subunits (NUP98, red) and 
Golgi apparatus (GM130, blue) in Vero cells. Samples were prepared by indirect immunolabeling 
using Abberior STAR RED and Abberior STAR ORANGE. Acquisition was performed using an 
Abberior Instruments Facility Line STED microscope. Shown is a maximum projection of a raw 
image stack. The inset shows that RESCue STED microscopy can resolve the ring-like organization 
of the nuclear pore complex proteins (as highlighted by the dotted white circles). Please note that 
the diameter of individual NUP98 rings is ~70 nm. For a better visualization, the STED image in 
the inset is smoothed. Data are courtesy of Abberior Instruments, Germany 


Example 1: Golgi Apparatus and Nuclear Pore Complex 

Figure 1.15 shows the complexly structured Golgi apparatus (blue) in a Vero cell. 
This cell organelle is known to be a collection and dispatch station of protein products 
from the endoplasmatic reticulum. It synthesizes and modifies elements of the plasma 
membrane and generates primary lysosomes. Next to the Golgi apparatus, the easily 
recognizable oval shaped cell nucleus is visible in red. More precisely, the image 
depicts the nuclear pore complex, a part of the nuclear envelope surrounding the cell 
nucleus, that allows transportation across the envelope. 

For confocal and STED imaging, the proteins GM130 (Golgi apparatus) and 
NUP98 (nuclear pore complex) have been immunolabeled with primary antibodies 
targeting the respective proteins and dye labeled secondary antibodies (Abberior 
STAR ORANGE, Abberior STAR RED) binding to the latter. It is evident, that the 
structures are resolved with much more detail in the RESCue STED image. Especially 
the ring-like arrangement of the nuclear pore complex proteins can be discerned (cf. 
dotted circles in the inset of Fig. 1.15). 
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Fig. 1.16 Confocal and STED images of spectrin periodicity in primary rat hippocampal neu- 
rons. Please note the characteristic “190 nm beta II spectrin periodicity along distal axons (red, 
blue) which is only visible in the STED image. Labelled structures: beta II spectrin (red, Abberior 
STAR635P), actin (blue, Abberior STAR 580). Acquisition was performed using an Abberior Instru- 
ments STEDYCON microscope. Shown are raw data. Data are courtesy of Abberior Instruments, 
Germany 


Example 2: Nanoscopy of Neurons 

Neurons are highly specialized cells, which are the basic building blocks of the 
nervous system and transmit information throughout the body. STED nanoscopy 
revealed that short actin filaments in neuronal axons, dendrites and spine necks are 
bridged by spectrin tetramers to form an ~190 nm periodic structure [53]. 

An exemplary measurement of the actin (blue) and beta II spectrin (red) distribu- 
tion in the axons of a primary rat hippocampal neuron is shown in Fig. 1.16. While in 
the confocal image only little information on the co-localization can be obtained, the 
characteristic periodicity of the beta II spectrin as well as the actin is easily seen in 
the STED image. The inset emphasizes previously concealed details, that are clearly 
visible in the STED image. 


Example 3: Nanoscopy of Mitochondria 

Although mitochondria are best known for their role as the ‘power houses’ of the 
cell, they are also key players in executing apoptosis, a tightly regulated suicide 
program in eukaryotic cells [53]. Moreover, damage and subsequent dysfunction of 
mitochondria is known as an important factor for several human diseases. With a 
diameter of approximately 300-500 nm in cultured mammalian cells, their structure 
is not accessible to conventional light microscopy. 

STED nanoscopy revealed that Tom20, a membrane-spanning receptor protein of 
the translocase of the outer membrane complex, is found in clusters on the surfaces 
of mitochondria [53]. Super-resolution studies also showed that the nucleoids in 
mitochondria have a diameter of 70-110 nm and allowed conclusions on the number 
of copies of mitochondrial DNA (mtDNA) per nucleoid [53]. 
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Fig. 1.17 Confocal and STED images of mitochondrial protein (Tom20, red) and the mitochondrial 
genome (dsDNA, green) in Vero cells. Samples were prepared by indirect immunolabeling using 
Abberior STAR RED and Abberior STAR ORANGE. Acquisition was performed using an Abberior 
Instruments Facility Line STED microscope. Shown are raw data. Data are courtesy of Abberior 
Instruments, Germany 


Figure 1.17 depicts an image of the mitochondria in a Vero cell. The Tom20 pro- 
teins are illustrated in red and the mtDNA in green. Again in the STED image the 
clustering and co-localization of both proteins is evident, whereas only few conclu- 
sions can be drawn from the confocal image. 
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Chapter 2 A) 
Coherent X-ray Imaging get 


Tim Salditt and Anna-Lena Robisch 


Science, for me, gives a partial explanation for life. In so far as 
it goes, it is based on fact, experience and experiment. 
— Rosalind Franklin 


2.1 X-ray Propagation 


Coherent X-ray imaging is based on wave-optical propagation of electromagnetic 
waves, including free-space propagation and the interaction of short wavelength light 
with matter. Here we present an overview of fundamental principles of X-ray imaging 
and field propagation, with references to relevant literature. We first justify the use of 
scalar wave theory and approximations of paraxial (parabolic) wave equations. Then 
we show how to compute the wavefield at a distance d along the optical axis z with 
respect to a known field distribution in a plane at z = 0, assuming free space between 
planes z = 0 and z = d. Next, we address the projection approximation which is 
ubiquitous in X-ray imaging to describe the complex transmission function of an 
optically thin object. Finally, we present finite difference equations as a more general 
tool to treat X-ray propagation in matter and objects which cannot be approximated 
as thin. 
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2.1.1 Scalar Diffraction Theory and Wave Equations 


Propagation of stationary X-ray fields in matter can be described by the well-known 
Helmholtz equation (HE) 


AEC, w) + k’n’E(r, ©) = 0, (2.1) 


with vacuum wave number k = w/c, vacuum speed of light c, angular frequency 
æ, and the complex refractive index n of the propagation medium. Here, E(r, œ) 
denotes the time-domain Fourier transform of the electric vector field E(r, t) 


E(r, w) = F,lE(r, t)|(r, w) = i [ El, t) e dt. (2.2) 


The HE in the homogeneous form of (2.1) is derived from Maxwell ’s equations 
for stationary fields in media, which are homogeneous, isotropic, non-magnetic, 
non-conductive, and do not contain free charges. Further, the field intensity has to be 
sufficiently small to neglect the non-linear response of matter. While this derivation 
assumes a constant or at least piecewise constant index of refraction, the HE is still an 
excellent description even for n — n(r), i.e. spatially varying distributions of matter 
as in an object to be imaged or in an optical device (refractive lens, zone plate, waveg- 
uide). Also in a crystal, where the continuum approximation of the index of refraction 
n(r) must certainly break down, Fourier expansion with respect to the lattice vec- 
tors, still allows using (2.1). Indeed, propagation in a source-less but inhomogeneous 
medium with spatially varying dielectric function e(r) (equivalently magnetic per- 
meability u(r)) is well described by (2.1). This is surprising, since inhomogeneous 
media result in an inhomogeneous wave equation with corresponding source terms 
on the right hand side of (2.1). Certainly, in an inhomogeneous medium e(r) is not 
a slowly varying function on scales of the X-ray wavelength. However, the approxi- 
mations used to derive the HE are rescued by the simple fact that the X-ray index of 
refraction in matter is very close to the vacuum index of refraction, i.e. € ~ eu and 
u œ uo for X-rays. This is both a curse and a blessing. It is a blessing, because the 
weak interactions result in the bulk penetration capability for which hard X-rays are 
famous, as well as the beneficial approximation of kinematic diffraction, which often 
warrants a quantitative reconstruction. Note that multiple scattering events can be 
safely neglected in most X-ray imaging applications, contrary to electron and visible 
light optics. Yet, at the same time, weak interaction is a curse, because it severely 
limits our ability to create efficient optical elements such as focusing devices. 

Let us therefore briefly consider the (continuous) index of refraction n(r) = 
1 — ô(r) + if (r). Compared to other regions in the electromagnetic spectrum, the 
high frequency of X-rays manifests itself in extremely small dispersion 6 « 1 and 
absorption decrements 6 « 1, describing the phase shifts and absorption in matter, 
respectively. For a given element and atom density p4, the refractive index is given 
in terms of the atomic form factor 
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rer 


27 


A „ 
eis palt\IZ + Fo +i pale) f (), (23) 


where Z is the atomic number, re = 2.82 - 107!> m the Thomson scattering length; 
f'o) « Zand f"(w) « Z are the dispersion and absorption corrections. For mixed 
elemental composition, the indices are weighted averages according to the local 
stoichiometry of elements. 

Next, we want to justify scalar wave theory for X-rays. Strictly speaking, we 
have to solve the HE for all components of the electric and magnetic field: Y y € 
IE, E,, E}, By, By, B.}. In general, the field components are coupled (since they 
have to obey the full Maxwell system and/or different boundary conditions). For 
example, given the HE for the electric field in (2.1), the solution must also fulfill 


1 
V-E=0 and B=—VxE. (2.4) 
[20] 


Instead, in scalar wave theory, one often treats only a single component 
Ay+knv=0. (2.5) 


Note that this form can be further simplified to a second-order ordinary differen- 
tial equation, if one takes a two-dimensional Fourier transform with respect to the 
perpendicular space directions rı := (x, y)’ 


2,7, 2 
Ri + (kn? — k) Y = (> + e) y= 0, (2.6) 


az? az? 


with w = Fp, [W](kK.) and £ := ‚/k?n? — k? [1]. Scalar wave theory is ubiquitous 
in X-ray optics and X-ray imaging, but permissible only if polarization effects can 
be neglected and field propagation of different polarization states is equivalent. This 
is the case for many applications (apart from propagation in crystals for example), 
since the relevant diffraction angles are much smaller than the Brewster angle. 

Given a solution w of the scalar HE in (2.5), how do we obtain meaningful 
solutions and permissible polarisation states in terms of E and B? As shown in [1], 
one can construct solutions of the Maxwell system by setting Y = we, for any unit 
vector e,, and then compute 


i 1/1 

E=7(Vy)xe, and B=- (= V(e,-Vy)+n’w e») . D 
c 

General solutions can be constructed from linear combinations of three indepen- 


dent scalar potentials Yx, Wy, Yz so that Y = y,e, + Pyey + Yzez. For the special 
case of paraxial waves which can be written as the product of a slowly varying enve- 
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lope u(r) and a fast oscillating term (see Fig.2.1) with the wavevector pointing in 
direction of the optical axis ez, i.e. k = kez, i.e. propagation axis along z, we have 


wr) = u(r) eee, (2.8) 


and the corresponding electromagnetic field vectors are E ~ nye, and B ~ E ve, 
[1]. Given the numerically computed scalar field, the local (time-averaged) energy 
flux density also follows from the full electromagnetic field [1] via the Poynting 
vector 


1 
S= — E€ x B, (2.9) 
Ho 


where uo is the free-space permeability. The energy flux density (averaged or inte- 
grated over the exposure time) is often denoted as the optical intensity. For time- 
harmonic fields and requiring Hermitian symmetry E(—w) = E* (w), one can write 
the field based on a discrete sum of frequencies w; fori € N 


El, t) = F,'[E@, o)|(r, 1) 
= i yon [E(r, @;))] cos(@;t) + 3 [E(r, ©;)] sin(@;t), 
ieN 


and the equivalent expression for B, which is a starting point to compute S. When 
the fields are constructed from a single scalar potential y, which is slowly varying 
in direction of e, such that ||V (e, - Vv) || & |n?k”w], the time-averaged Poynting 
vector can be approximated as [1] 


(8) ~ rn ae, (2.10) 
ieN : 
with k; = w;/c. For paraxial beams it then follows 
CE0 3 2 Varg(y (w;)) 
S) ~ — X `R no; D aA, 2.11 
(S) ~ I R noT yo) o (2.11) 


ieN 


Hence, for a monochromatic paraxial beam, the magnitude of the time-averaged 
Poynting vector, i.e. the optical intensity 7, can be written as [1] 


T= 


SI = < TOM (2.12) 


with an energy flow oriented in direction of the phase gradient. 
The above approximation of the Poynting vector was derived under the assump- 
tion of paraxial waves. Indeed, many X-ray optical problems are well described by 
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forward directed paraxial diffraction. Thus, we can further approximate the scalar 
HE and arrive at the parabolic wave equation. To this end, we start with a separation 
ansatz for solutions of the HE in case of waves y(r) propagating along the optical 
axis Z 

wr) = ul) , (2.13) 


where u(r) is the (slowly varying) envelope of the wave field w(r). For X-rays in 
particular, we are often not interested in the rapidly oscillating term which changes 
sign on atomic length scales, and which is as irrelevant as the time-harmonic term. 
Instead, in numerical computation and plots, we are only interested in phases and 
amplitudes of u(r), which are well suited to monitor the small phase and inten- 
sity changes building up over many hundreds and thousands of atoms, see Fig. 2.1. 
Inserting (2.13) into the HE yields a differential equation for the envelope u 


V? (u(rye") +n? (r)k-u(rye = 0. (2.14) 
Working out the differential operators of the left-hand term 


Viu(rje + a, (3zu(r)e + ike'u(r)) 


= Viu(rjel + (a2u(r) + 2(8,u(r))ik) et — keut), (2.15) 
and dividing by e'*?, we obtain 
[Vi + 02 + 2ikd, + kn’) — 1)] ur) = 0. (2.16) 


For paraxial beams, the second order derivative in z can be neglected, given 


3z? 


<K [xs | since 92u < k?u, leading to the paraxial (or parabolic) wave equation 


[Vi + 2ikd, +P? (n(n) — D] u(r) = 0. (2.17) 
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The parabolic wave equation reduces to 
[Vi + 2ikd,] u(r) = 0 (2.18) 


in free space with Gaussian beams as a well known family of solutions. Compared to 
the elliptic Helmholtz equation, its parabolic approximations offer higher numerical 
stability. Furthermore, solutions are more easily accessible in terms of boundary 
conditions. Initial values have to be specified in a plane at small z along with lateral 
boundary conditions, but no values at the high z boundary of the computational 
domain are required. For these reasons, paraxial approximations have become an 
important tool in X-ray optics [2—4], including generalizations to time-dependent 
propagation problems via the spectral approach [1]. As in Schrödinger’s equation, 
the parabolic wave equation can be rearranged to 


= — eS Tk u. (2.19) 


This form of the paraxial wave equation is typically used in X-ray optics. 

Turning back to the Helmholtz equation formulated as in (2.6), we complete 
this section by presenting a slightly different form of the parabolic wave equation, 
put forward in [1] and based on the approach in [5]. Due to the less restrictive 
approximations necessary for its derivation, we expect this form to have a larger 
range of validity. In particular, the assumption | + re K Ir 
92u < k?u) is not required anymore. The differential operator in (2.6) can be written 
as a product of two operators 


(Eli) 


The scalar HE is hence solved by a solution of either of the differential equations 
(forward and backward HE) 


(or correspondingly 


= —iß ý. (2.20) 


In the context of paraxial beams, we are interested in the right equation describing 
a wave vector oriented in e; direction. Further, for paraxial beams, the support of % is 
constrained to low spatial frequencies with k? < k?. Therefore, one can approximate 
the square root in £ through its first-order Taylor series around k? =0[1] 


p= Rn? -k xk AL, 
= kn — — 
2kn 
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Substitution into (2.20) (right) and transforming back to real space results in [1] 


1 
= A iknw. 2.21 
i uin Vo mny a) 
Equation (2.21) can again be formulated for the envelope u, defined by y = 
u exp(—ikz) [1] 
u 1 
=— A 
dz  2ikn 


u -ik(n- Du. (2.22) 


For n = 1, (2.22) and (2.19) become identical. As a consequence in the regime 
of hard X-rays, solution of equations (2.22) and (2.19) will not differ. Yet, for soft 
X-rays, the difference could become relevant. 


2.1.2 Propagation in Free Space 


We first address the propagation in free space, following the angular spectrum 
approach as presented in the textbook of Paganin [6]. Again, we assume a time 
independent, monochromatic wave, i.e. we treat a single component w,,(r) of the 
spectrum with angular frequency w and corresponding wavelength A. A general 
time-dependent field of finite bandwidth is then computed as superposition of its 
monochromatic components by 


Wir, t) = I J Polr) exp liwt]do. (2.23) 


As discussed above, the single spectral component Ya (r) must obey the free-space 


Helmholtz equation 
2m \* 


where we have dropped the subscript We —> w for simplicity of notation. Particular 
solutions of the Helmholtz equation are plane waves 


Wp (r) = exp [ik - r] = exp |i (kxx + kyy +k,z)], (2.25) 


where k? = ke + ke + k? = nn The z-dependent part of the plane wave can be 
separated by 


Yp(x, y, z) = exp [i (kyx + kyy)] exp iz, /k? — k? — e | : (2.26) 
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The last equation entails an important message: Knowing the plane wave in the 
source plane Yp (x, y, z = 0) = exp [i (kex + ky y), the electromagnetic field at any 
distance z can be calculated by a simple multiplication with the so-called “free-space 


propagator” exp [i z,/k? — k? — k? |. To propagate an arbitrary wave field y, we 


express ı given in plane z = 0 by its Fourier transform Ý (ky, ky,z = 0) 


Ya, y,z=0)= x f f Ý (kx, ky, z = O) exp [i (kxx + k,y)]dkıdk,. (2.27) 


This Fourier transform can be read as a superposition of plane waves. Since we can 
expand (almost) any wave field of interest in such a Fourier integral in the source plane 
and since we know how to propagate plane waves, we also know how to propagate 
general wave fields. It is thus possible to compute the field at any distance z from 
the given field in the xy-plane at z = 0. This allows interpreting an electromagnetic 
disturbance in a plane at z = 0 as a superposition of plane waves of fixed modulus 
of the wavevector leaving the plane of interest under different angles 


[k2 + k? 
f (2.28) 


0 = arcsin — 7)» 


see also Fig.2.2 and [7]. Each of these plane waves can be propagated from z = 0 to 
any distance z > 0 by multiplication with the free-space propagator 


u(r) = = ff Woke = 0) - exp [iz [k2 — k? - e | 
-exp [i (kxx + k,y)|dkıdk,. (2.29) 


Next, we restrict the wave fields of interest to paraxial waves, i.e. those which 
propagate at small angles with respect to the optical axis z. In this case, ky and ky are 


Fig. 2.2 Angular spectrum 
approach for propagation of 
arbitrary wavefields. A 
wavefield in the source plane x 
is decomposed into plane 
waves propagating under 
different angles (2.28) with 
respect to the optical axis. 
Each plane wave can be 
propagated by application of 
the free-space propagator 
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much smaller compared to k,. As a consequence, we can approximate the free-space 


propagator by 
exp [iz [k -k — | ~ exp [ikz] Gelky. ky; z), (2.30) 
where 
—iz(k? + k?) 
Gg (kz, ky; Z) := exp a le (2.31) 


Using (2.30) and formulating (2.29) as an operator-equation the propagated field 
at distance z is 


D.[y(&,y,z=0)]@&, y, z) := 
exp(ikz) F`! {Gr (kx, ky DF IW, yz = 0)]} (x, y, Z), (2.32) 


where F is the Fourier transform with respect to the xy-plane. Application of the 


Fourier convolution theorem to (2.32) and expansion of the exponent in the integrand 
leads to the Fresnel diffraction integral 


Vo (r) = 


apliko exp | (0? +39) f / EEEE 


i2z 
ik n 2 —ik ! j / ! 
exp | zz (x + y”) | exp = + yy’) | dx'dy’. (2.33) 


The chirp function exp [ë (x 2 4 y”)] merits a closer look. Reformulating the 
argument of the exponential function results in 


k 2 "S x (x? + y?) 
5: (x? +y’) = — = (2.34) 


The term (x? + y?) ‘measures’ the squared spatial cross-section of features in 
the input wavefield. Let a be the smallest such structure of interest, so that the argu- 
ment of the exponential, the so-called chirp, is governed by the dimensionless ratio 
£ =: F called the Fresnel number. For large propagation distances (compared to 
wavelength and a), the Fresnel number approaches zero and hence the chirp func- 
tion is close to unity: The Fresnel diffraction integral becomes the Fourier transform 
of the wavefield. This limiting case is known as the Fraunhofer far field approxima- 
tion. Fresnel numbers close to one indicate the optical near field: The chirp function 
cannot be neglected. Indeed, as we will see below, it is the chirp function that makes 
numerical free-space propagation challenging. Importantly, paraxial propagation is 
governed only by a single parameter F, so that all simulations can be carried out in 


natural units of pixel size and with a single unitless parameter F. 
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Before we turn to numerical implementation, however, a brief comment on the 
choice of relevant feature size a is reasonable. A natural choice in the discrete setting 
of numerical calculations is the pixel size a = Ax. The Fraunhofer regime F < | is 
then quickly reached with increasing z. These ‘far field’ holograms, however, do not 
look like far field diffraction patterns (squared Fourier transform of the object), which 
is usually associated with the Fraunhofer approximation. The reason is that there is 
always another length scale a to be considered, namely the beam size. Only when 
the Fresnel number—computed for all object length scales and the length scale of 
the beam—is much smaller than one, we get the conventional Fraunhofer diffraction 
pattern without mixing of object wave and primary wave. 


2.1.3 The Fresnel Scaling Theorem 


Consider a setting where an object is illuminated by the incident wavefield y;. 
Two limiting cases of the illumination geometry are of particular importance: plane 


(a) 
(c) Zo1=Zp Zo1=Zo2 
1 Wax 
$ 0.1 
N 
z 
Y Gre, 
= age = 
wed ER 
ee size) 
Mpg FT san nn ann anna ana nn anna nun nen ass anna nenn nn Wo 
unresolved (N.A.) 


w,/Ax 0.1 1 
Z=z,,/2,=1/M 


Fig. 2.3 a Illustration of the hologram created by a single point in plane wave illumination (parallel 
beam), and b the equivalent for spherical wave illumination (cone beam). According to the Fresnel 
scaling theorem, the holograms are identical up to a simple variable transformation, including the 
geometric magnification M and an effective defocus distance zerr. € Cone beam geometry in unitless 
variables, taking a Gaussian beam as an example for a coherent and divergent beam, see text for 
explanations 
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wave illumination and spherical wave illumination. The spherical beam geometry 
(also denoted as cone-beam geometry) can be used to implement radiography at 
the nanoscale, well below the resolution limit of the detector pixel size Ax. As 
sketched in Fig. 2.3a, b, this requires X-ray nano-focusing optics [8, 9], as presented 
in Chap.3 and a defocus geometry with object and detector positioned behind the 
focus at distances zo; and zg, respectively, see also Chap. 13. Focusing serves two 
purposes: Firstly, the photon density is increased in the object plane, as long as the 
defocus position is smaller than the focal length of the optic zo; < f. Secondly, 
it magnifies the near field pattern (hologram) and thus enables a resolution below 
the detector pixel size Ax. To discuss the effect of the spherical illumination func- 
tion y; and following the projection approximation, we slightly rewrite the input 
wavefield as a product of illumination and a complex-valued object transfer function 
We (x', y, 0) = yix, y’,O)t(x’, y^) and insert it in (2.33) yielding 


vr) = aw ff dx’dy Wi(x', y) tx, y) 
- exp = (x? + | exp [= (xx! + w)| : (2.35) 
2212 212 


where z12 := Zo2 — Zo, denotes the distance between object and detector. For sim- 
plicity of notation, the subscript w was skipped, and the prefactors were abbreviated 
by the complex valued A(r). The factorization underlying the projection approxi- 
mation will be justified further below. Equation (2.33) essentially describes the case 
of a plane wave incident illumination y; = e with t(x’, y’,0) = W(x’, y’, 0), ie. 
no sample in the beam path. However, for point source illumination with 


ik 
Wi = exp Ee +y?+ a = exp (ikzo1) exp = (x? + | , (2.36) 
01 


the integral becomes 
j f 7 7 . —ik ! 1 
y(r) = A(r) dxdy t(x,y) exp (ikzo1) exp Sa +yy) 
12 
ik ik 
exp | a? + y?) exp| ay, (2.37) 
2201 2212 
ool (are 


where A(r) may also account for the amplitude decrease of the spherical wave. With 
the geometric magnification given by 


m= (2.38) 


Z01 


and the definition of an effective propagation distance zer 
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201 212 1 1 J Z12 
Zeft = = + Se (2.39) 
Zoi + Z12 (= Z12 M 


we obtain 


; ! j Fag ik 12 12 
y(r) = A(r) exp Geo ff as dy T(x’, y) exp | —— (x +y ) 


2 eff 


of) e 


i.e. wave propagation is equivalent in plane and spherical geometry, if a simple 
coordinate transformation is performed. This result is known as the Fresnel Scaling 
Theorem (FST). Following the FST, numerical propagation which will be discussed 
in the next section, is always performed in the effective parallel beam coordinate 
system. But before doing so, we note down an immediate consequence of spherical 
beam propagation, which is also illustrated in Fig.2.3. While the Fresnel number 
F decreases with increasing distance between object and detector, the opposite is 
true for the divergent beam setting. As Zo; is reduced (at constant zo2), the imaging 
regime becomes more and more holographic (decrease in F) since the decrease in 
effective pixels size Axe = Ax/M enters quadratically and outweighs the effect of 
Zee. For the minima of the contrast transfer function, which will be discussed further 
below, this means that their number increases as Zo; is decreased (or equivalently M 
is increased), see the dashed lines in Fig.2.3c. In order to plot the divergent beam 
geometry in unitless variables, the intensity of a Gaussian illumination can serve as 
a model for the diffraction limited beam 


2 yd! ni 
I(r) = n( 2) exp >| (2.41) 


0 
w(z) w(z)? 


with waist wo, w(z) = woy 1 + (z/zr)? and Rayleigh length zp = wam/d. Next, we 
parameterize the position of the object along the optical axis by Z = Zo: /Zo2 = 1/M, 
such that this inverse magnification varies between 1/Mmax = Zr/Zo2 and one. The 
corresponding lateral width of the beam varies between wo and Wmax = @Zo2, With 
divergence «œ defining the numerical aperture. In unitless variables we designate 
the beam width by W = w(Z)/Wymax, and plot W as a function of 1/M on double- 
logarithmic scale, see Fig. 2.3c. Accordingly, the blue line shows the increase in beam 
width, while the orange line designates the effective pixel size Axer = (Ax /Z02)Zo1; 
i.e. the demagnification size of a single detector pixel. In unitless variables this 
line reaches W = Ax/Wmax for Z = 1 and crosses the dashed red horizontal line 
at Z = wo/Ax. The dashed red line separates lateral length scales which can be 
resolved (above) from those which are unresolved (below), based on the limited 
numerical aperture (N.A.). 
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2.1.4 Numerical Implementation of Free-Space Propagation 


The numerical implementation of free-space propagation is not trivial. In particular, 
sufficient sampling of chirp functions has to be guaranteed. Extensive literature can 
be found at [7, 10, 11]. There are two main options to implement propagation. The 
first uses two Fourier transformations and has equal coordinate systems in source and 
destination plane, the second is based on a single transformation and can work with 
differently sized and sampled planes. We start with the first option, which is based 
on (2.29), according to which propagation can be designed as a filter operation of 
the wavefield in Fourier space. The corresponding filter is the free-space propagator, 
which is given by (2.31) in paraxial approximation G;(k,, ky; z) as a function in 
reciprocal space. The propagator function can also be written down analytically in 
real space (called the impulse response function), and then be numerically Fourier 
transformed. Importantly, the coordinate system of input and output field are iden- 
tical, and the propagation is based on two fast Fourier transform (FFT) operations. 
For the second approach, the propagation can be directly calculated based on (2.33), 
involving a single Fourier transform, either by a single FFT or by other numerical 
solutions of the Fourier integral, e.g. for non-equidistant sampling. This approach is 
well suited for cases where the pixel sizes between input and output must vary, e.g. 
to cover the field of view in a detector after diffraction broadening. 

Next, we consider the sampling criteria for the first method. In order to correctly 
sample the free-space propagator or reciprocal space chirp, the real space sampling 


interval Ax has to be [11] 
À 
Ars, (2.42) 
L 


where L is the field of view consisting of N pixels: L = N Ax. Hence, only a large 
number N of pixels or a short propagation distance result in aliasing-free sampling 
of the propagated field. To find a remedy, one can artificially increase the number 
of pixels. Yet, this drastically decreases the computational speed. Alternatively, one 
can also write down the impulse response function, i.e. the real space counterpart of 
the reciprocal space chirp: 


k ik(x? + y? 
F (Gk, 5 DIG, y, z) = - exp | ( 2 |. (2.43) 
iz 2z 
The corresponding sampling criterion is [11] 
ÀZ 
Ax < —. (2.44) 
L 


Violation of the last equation results in periodic copies of the chirp interfering 
with each other. A way out is to use a suitable window function to limit the chirp 
[7, 12]. Following (2.28) and [7], the highest angle under which plane waves can be 
emitted is given by 
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Akins 
Omax = arcsin (=) (2.45) 


In the discrete version of the Fourier transform, the Shannon sampling theorem 


is fulfilled by the relation 


27 
AkAx = —. (2.46) 
N 


The highest resolvable spatial frequency corresponds to an oscillation that extends 
over two real space pixels, or equivalently 
N Na Žž x 


Akmax = — +: Ak = — = —. 
2 2 AxN Ax 


(2.47) 


Hence, the largest properly sampled angle Omax under which radiation is emitted 
is given by 


À 
On in| — |}. 2.48 
arcsin ( ne ) ( ) 


A useful window function to be multiplied with the real space chirp is [7] 


cos (3 =) if < Omax 
Ww(6) = max (2.49) 
0 else. 


Finally, we consider the samplingrequirement for the second propagation approach, 
based directly on the Fresnel diffraction integral. This requires sampling of chirp 


functions in the source plane. 
Az 
Ax < —. (2.50) 
L 
In fact, if only intensity in the destination plane is of interest, the observation chirp 
does not need to be sampled. Further, instead of a destination plane, the output field 
should be regarded as a function of spatial frequencies, rather than spatial coordinates 
of the detector. In this way one can dispense of the observation chirp altogether. 


2.1.5 X-ray Propagation in Matter 


After considering propagation in free space, the next challenge is to conceptualize the 
interaction of X-rays and matter as well as to give suitable approximations to compute 
stationary wave propagation in matter. For wave propagation, we are not interested 
in incoherent and inelastic interaction processes such as Compton scattering, nor in 
the cascade of processes following photo absorption of X-ray photons [13, 14], but 
only in the elastic scattering events of atoms which collectively result in diffraction 
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and refraction effects. For X-ray imaging, matter mainly has two functions: either 
optical component or sample/object of interest. If objects and optical components 
are sufficiently thin so that propagation effects (also called volume effects) can be 
neglected, the propagation effects can be described by a simple multiplication, as 
shown below. For a more general case, (2.16) or (2.17) have to be solved for the 
paraxial case. This is a formidable task and analytical solutions only exist for a 
few very special cases (sphere, slab waveguide). Neglecting atomic-scale variations 
which would become important only for perfect crystals, we can write the index of 
refraction in continuum approximation as in (2.3) with the real-valued decrement or 
dispersion term 6 given by [13] 


roa2 
ö(r) = aq Par IZ ar eee (2.51) 


where p, is the number density of atoms, rg = 2.82 - 107'°m the Thomson scattering 
length, Z the number of electrons in the atom and f’(E) the real part of the atomic 
form factor correction at energy E. For the imaginary part (absorption term) 6 we 
have 


rod? j 
fr) = “og Pa nf (@), (2.52) 
JT 


with f” (E) being the imaginary part of the atomic form factor. Note that form 
factors and their dependence on photon energy E are tabulated for each element 
in the International Tables for Crystallography and are also available online.! Away 
from absorption edges, the real part of n (r) only depends on the total electron density 
(summed over all elements) 

roA? 


ô = zy PC) (2.53) 


Due to the small value of the Thomson scattering length resulting in 6 < 1, 
diffraction and refraction effects in the X-ray spectral range are weak, and hence 
there is ample room for approximations, notably the ansatz by Born or the ansatz by 
Rytov, which are discussed in view of X-ray propagation in [15-17]. 

Within the first order of the Born approximation, we can neglect multiple scat- 
tering if the sample is sufficiently thin and consider the scattering y,(r) as a small 
additive correction to the primary wave wWo(r), i.e. 


yr) = yolr) + y(r). (2.54) 
For a far field observation point, the scattered wave is given as a Fourier transform 


of the scattering length density, i.e. rop, with p being the electron density and ro the 
Thomson scattering length 


‘http://it.iucr.org/Cb/ch4o2v0001/sec40206/ and http://henke.lbl.gov/optical_constants/. 
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Var) X ro fa p(r)expügr'), (2.55) 


where q is the scattering vector. Alternatively, according to the Rytov ansatz, we can 
also write the solution in a multiplicative form as 


y(r) = Wo(r) exp(gs(r)), (2.56) 


where ¢, (r) is the complex-valued phase of the scattered field and wo(r) again the 
primary wave amplitude without perturbation by the object. Rytov’s approximation 
then requires |(Vp,)?| & |t|, with the object function defined as t(r) = kd _ 
n(r)), where kg = 277/X. One can compute g, from t using the Green’s function 
G(r) = exp(ikr)/(4zr) [17] 


Qs (tr) = = fo Gr - Fr) yot(r’), (2.57) 
Wo 


which in frequency space gives [17] 


exp [iz (ye =k? — k? = ko) 


2i |k — k? — k? 


-E (ka, ky, K8 — K — K — ko) . (2.58) 


2 
= 
2K 


Ds (ky, ky, z) = 


k? 
2k? 


objects with thickness Az sufficiently small it can be shown that [17] 


For paraxial propagation with ,/kj — k2 — k? ~ ko (1 — | and for 


z 1 AZ yan] = 
Ps (kx, ky, z) = nn Z E tR) | TE, ky, 0). (2.59) 


This last idea of considering the limit of small object thickness, is extremely useful 
and shall be approached in even simpler terms. To this end, let us first consider a 
plane wave exp(ikz) incident at z = 0 onto a homogeneous slab of thickness 4z, the 
wavefield behind the object (outgoing wave) can then be written as 


exp [iko Az(1 — 6 + iB)] = exp(iko Az) 
-exp(—ikọô Az) exp(—koß Az), (2.60) 


where exp(—ikọô Az) describes phase shift and exp(—koßf Az) = exp(-uAz/2) 
absorption of the wave in the medium (absorption coefficient u). For an inhomoge- 
neous medium and negligible diffraction inside the specimen, the complex-valued 
transmission function of a slab can then be expressed by the integral along the optical 
path 
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Az 
t(x, y) = exp |- f (6m) — iB(r)) dz] . (2.61) 
0 


The underlying approximation is known as the projection approximation. Its basic 
assumption is that the value of the wavefield is entirely determined by the phase and 
amplitude shifts accumulated along streamlines of the unscattered beam. In doing 
so, the spread of the wave by diffraction inside the sample is ignored. Thus, this is 
correct only down to a certain sample size, depending on the resolution element and 
photon energy E. The projection approximation is valid for sufficiently small spatial 
frequencies, fulfilling 


k- KR- kk, (2.62) 
eee SS AT 


where AT denotes the thickness of the object. Propagation through arbitrary objects 
can be treated by sequences of small projection and propagation steps, as if matter 
came in form of thin slices lined up along the optical axis (multi-slice approach). For 
further details of this method see [1, 18]. A different approach to solve propagation 
through matter is propagation by finite difference equations which is presented below. 
Finite difference propagation and the multi-slice approach have been compared in [1]. 


2.1.6 Propagation by Finite Difference Equations 


Next, we will present the basic scheme of finite difference equations (FD) for propa- 
gation in the framework of the paraxial wave equation. As the paraxial wave equation 
is a parabolic partial differential equation, initial and boundary data are required for 
its solution. But in contrast to elliptic equations such as the Helmholtz equation, 
there is no data required on a closed boundary. In other words, we can propagate 
from left to right along the optical axis, and we expect causality in the sense that the 
field is only determined by what has happened upstream, but not downstream. For 
two dimensional propagation problems (one dimension for the optical axis z plus a 
single lateral direction x), i.e. if the index of refraction is independent of y, following 
(2.19), the parabolic wave equation can be written as 


du _ Mu C(x, 2) (2.63) 
— = A— Xx, Z)U, $ 
az ax? i 
where 
A:= : and C(x,2) := = (n*(x,z) — 1) 
Qik di i 
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Fig. 2.4 a Three dimensional distribution of the refractive index in a voxelated space. The opti- 
cal axis is parallel to the z axis and X-rays (wavevector k) are impinging on the sample. b Wave 
propagation using finite difference equations solved by the Crank-Nicolson algorithm. Boundary 
conditions/initial values are represented by thick gray lines. Colored regions denote regions of differ- 
ent refractive index. Note that because the parabolic wave equation is used, no boundary conditions 
for u(x, zy, + 1) need to be set. In order to compute u, the values uf, uk; Ug} un, wti 
are required 


with boundary conditions 


u(x, 0) = ug(x, 0), (2.64) 
u(xo, Z) = uo(xo, 2), (2.65) 
U(XN,, Z) = Uo(xn,, Z), (2.66) 


where uo(x, 0) is the field in the initial plane, while uo(xo, z) and uo(xn,,z) are 
the bottom and top boundary conditions to be specified along lines parallel to the 
optical axis, at the sides of the computational field of view. The computational field 
of view is modeled such that it has N, + 1 grid points in the lateral direction x, 
and N, + 1 grid points along the optical axis. Importantly, and contrarily to ellip- 
tical partial differential equations, no values have to be set for uo(x, N; + 1). The 
initial-boundary-value problem given in (2.63) and (2.66) can be solved numerically 
using finite-difference schemes [3, 19]. Here, the crucial point is the update scheme 
(‘stencil’), see Fig. 2.4b. For the two dimensional case (one dimension for the opti- 
cal axis, one lateral dimension), the stencil introduced by Crank and Nicolson [20] 
gives second order accuracy in steps Az along the optical axis and Ax perpendic- 
ular to the optical axis [21]. Accordingly, (2.63) is approximated by the following 
finite-difference expressions [3] 
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n+l oon 
> R ug - U, (2.67) 
Z Z 
h ii n+l n+l n 
aoe , A [ug — 2ug t+ up, 4 uti — 2uy* + REI (2.68) 
dx? 2 Ax? Ax? . 
n+4 
C(x,zu > — (uj + uz") ee 


; +1 al ; ‘ 
with uf = u (xk, Zn) and C; ? = C (Xk, 2,41), resulting in the finite-difference equa- 
tion [3] g 


n+l n n n n n+l n+l n+l 
u u A (= — Quy + Ue + u, uy" + i) 


Az 2 Ax? Ax? 
n+i 
oe n n+1 
ios (ui + ul"). (2.70) 


Ax nih on A 
a, := A— Cp -i 2.71 
x Age k 5 (2.71) 
and (2.70) may now be written as a system of N, — 1 linear equations [3] 
M"u"*! = d" (2.72) 
with 
n+4 a 
l+ay—c, ZZ 0 
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and 


n+3 ay a, _ n+l 
w+(l-a. +e, "Jui + Fuy + Fug 


ax n+7 
Zu + (lay +c, u+ F 


FuN, 3 t(l- ax + eh ut, -2+3 ZUM,- 

un arte ay + cy, ut +% N, T 7 uy 
Since d” has no field values with index n + 1 except for u#+' and Tie which are 
set by the boundary conditions (2.65) and (2.66), it is possible to compute the field 
in plane n + 1 from plane n, by solving a system of N, — 1 linear equations. Fur- 
thermore, because M” is tridiagonal, this can be carried out with O(N,) operations 
[21]. Finally, the process is repeated sequentially for all N, grid points, resulting 
in numerical complexity O(N, x Nz). The update scheme for propagation in three 
dimensions (3d) (i.e. 2d + 1d) is described in [1, 22, 23]. 


2.2 Coherent Image Formation 


In early years, electron microscopy was severely limited by aberrations of elec- 
tromagnetic objective lenses. To overcome these difficulties, D. Gabor proposed 
lens-less imaging by inline holography in 1948, inspired by L. Bragg’s ideas of a 
coherent projection X-ray microscope, see [24] for a historical perspective. Gabor 
demonstrated his idea of coherent imaging without optics with visible light [25]. 
Instead of recording sharp images with a lens-based system, the (near field) inter- 
ference pattern behind a (semi-transparent) object was recorded on a photographic 
plate. The pattern originates from interference of the direct beam passing through the 
object with the secondary waves diffracted by the object. Illuminating the (reversed) 
photographic plate, i.e. using the photographic plate as an object resulted in a clear 
image of the sample. In Gabor’s time, this type of holographic reconstruction was 
by necessity of analogue optical nature, whereas nowadays a hologram recorded by 
a detection device can be reconstructed numerically. 

While in electron microscopy the lens problems were eventually overcome, a sim- 
ilar challenge appeared in X-ray microscopy: X-ray lenses (diffractive or refractive) 
lack the efficiency and numerical aperture which we are used to from visible light 
lenses. At the same time, X-rays are attractive as a microscopy probe owing to short 
wavelength and hence potentially high resolution as well as high penetration power. 
Therefore, Gabor’s idea of coherent lens-less imaging re-emerged in X-ray optics and 
microscopy, once that radiation of high brilliance and sufficient coherence became 
available by synchrotron sources, in particular after the invention of undulators. In 
contrast to coherent diffractive imaging (CDI), where the primary beam is blocked 
behind the sample and the diffracted radiation is recorded without interference with 
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Fig. 2.5 Illustration of coherent diffractive imaging (optical far field) and holographic full field 
imaging (optical near field). a In conventional diffraction from crystalline or non-crystalline spec- 
imen, intensity scattered to high angles is measured and the direct beam is blocked. b Contrarily, 
holography uses the interference between diffracted waves and the direct beam, such that phase 
information is encoded in the intensity pattern. Note that the techniques require different detectors: 
single photon counting detectors with a large numerical aperture are preferred for (a), while (b) 
requires high resolution detection devices such as scintillator-based cameras 


the primary or reference wave, and in direct analogy to Gabor’s setup, X-ray hologra- 
phy is based on the interference between primary and scattered waves on the detector 
(see Fig. 2.5). 


2.2.1 Holographic Imaging in Full Field Setting 


Inline holography is a full field imaging technique, in which many resolution elements 
of object and detector are illuminated in parallel, typically employing a mega-pixel 
detector with sufficiently small pixel size to record the fine interference fringes 
between scattered and primary waves. While the field of view (FOV) can of course 
be further increased by lateral scanning, an image of large FOV can be acquired 
in a single exposure. This is in contrast to coherent techniques which are based on 
scanning the object in a focused beam, or which require a fully coherent illumination 
and are therefore limited to a correspondingly small field of view. For this reason, 
holographic imaging is of significant advantage in particular for tomography, where 
compared to the net counting time, motor overhead and detector readout for three 
degrees of freedom become a dominant time factor in recording data. Furthermore, 
time-resolved imaging is more easily implemented in parallel than serial acquisition. 

Two geometries are commonly used for inline X-ray holography: (quasi-) paral- 
lel and (quasi-)spherical illumination. The first case is implemented at synchrotron 
beamlines using almost the full undulator beam without focusing. The divergence is 
then small enough so that in the object and detector plane the beam is almost of the 
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Fig. 2.6 a Cone-beam setting for holographic full field imaging. The object is illuminated by 
diverging wavefronts. b Placing the sample at different defocus positions results in oscillatory 
contrast of spatial frequencies in the recorded holograms. c—e Illustration of the effect of geometrical 
magnification: Placing the sample close to the focus (ec) increases magnification and the number of 
contrast oscillations when plotting the CTF as a function of spatial frequency (dark blue curve in 
b). Placing the sample closer to the detector (d—e) decreases magnification and reduces the number 
of contrast oscillations (light blue and cyan curves in b) 


same lateral size. The FOV is adjusted by slits upstream from the object. This geom- 
etry is used for holography/tomography of large objects with resolution elements at 
the size of the detector pixels. 

The second case is cone beam illumination used for high resolution holographic 
imaging, since it offers variable geometric magnification and FOV by adjusting 
the distance between focus and object, the so-called defocus distance. Resolution 
elements are of the detector pixel size scaled by the inverse geometric magnification. 
Figure 2.6 illustrates magnification and contrast evolution for a phantom consisting 
of an assembly of spheres. The Fresnel scaling theorem was used to compute the 
contrast in an effective parallel beam setup which is numerically more convenient. 
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2.2.2 Contrast in X-ray Holograms 


As discussed above, elastic interaction of light and matter can be accounted for by 
the complex-valued index of refraction, which we rewrite here as 


nA) = 1— dQ) +ib à) (2.73) 


to stress its wavelength dependency. For forward scattered radiation and away from 
absorption edges, ô is directly proportional to the electron density p of the sample 
[13] 

TO 19 

A) = —A‘“p, (2.74) 

27 
with the classical electron radius ro. In this (often warranted) approximation, ô varies 
quadratically with wavelength, independent of sample stoichiometry, in contrast to 
the imaginary part of the index of refraction B(A), which strongly depends on the 
elemental composition and also exhibits much larger jump discontinuities at absorp- 
tion edges. For hard X-rays and low-Z materials such as in soft and biological matter, 
ö(A) is up to three orders of magnitude larger than £ (à). For this reason, interference 
contrast in holograms due to phase shift often dominates over absorption contrast. By 
enhancing radiographic imaging with phase contrast, fine details of weakly absorb- 
ing components become detectable, which would otherwise be invisible in classical 
absorption based X-ray radiography. The effect of the object on the phase of the 
so-called exit wave W(x, y) directly behind the object, becomes apparent by con- 
sidering a monochromatic plane wave (wavelength à) passing through an object 
which is homogeneous in z and has a thickness AT: 


W(x, y) = exp [—ikô (à, x, y)AT]- exp (-zu0. x, yar) ; (2.75) 


where k = = and (A, x, y) is the absorption coefficient with u = 2kß. Neglecting 
absorption and replacing 6 by (2.74) results in 


Wax, y) © exp[—irodp(x, WAT]. (2.76) 
Phase retrieval techniques aim at recovering the phase in the exit plane 
p(x, y, à) = —roAp(x, yJAT (2.77) 


from measured intensity distributions. By (2.77), it becomes clear that the phase of 
the exit plane is directly proportional to the local electron density, if the object is 
homogeneous in z. More generally, for inhomogeneous (thin) objects, the phase is 
proportional to the projected electron density, as f p(x, y, z)dz. 

The lateral distribution of g(x, y) then results in diffraction behind the object, and 
as the wave propagates, the phase information is converted to an intensity pattern. 
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The relative contrast of features on different length scales (spatial frequencies) is 
found to change in a characteristic manner with the free-space propagation distance 
between object and detector, or more generally with Fresnel number. This evolution 
of contrast can be analyzed by inspecting the Fourier transform of the hologram. 
What one finds is an oscillatory contrast as a function of spatial frequency. For an 
optically weak object (i.e. weakly varying phase and small absorption) it can be 
shown that in Fourier space, intensity is given as a linear combination of the object’s 
phase and absorption map [26-29] 


F (I(x, y, 2)/Io — 1] = 2sin (+ (ko + e) F igx, y,A)] 


À 
n cos (2 e+e) Fiu, y A], (2.78) 


with z the distance between sample and detector and k = 27 v the angular spatial fre- 
quency in the object plane. To obtain this linear relationship, the signal is normalized 
to the incident plane, subtracted by one. If the diffraction of the illuminating wave 
is neglected, the expression can be generalized to an inhomogeneous illumination 
field Ip) > I(x, y, 0), see [30, 31] for a discussion of empty beam division. The 
structure of (2.78) suggests to define linear filter functions describing the evolution 
of contrast. For the case of weak phase objects (i.e. with weakly varying phase and 
negligible absorption), the contrast transfer function (CTF) can be defined as 


F I(x, y, z) (x, y, 0) - 1] = CTF, wx, vy, F)- Flox, y, 3), (2.79) 


where F [g(x, y, A)] is the Fourier transform of the object function and 
CTF F) = 2 [sin [I (v2 + v2) 2.80 
p(x, Vy, F)=2jsin F v ti 3 (2.80) 


If phase and absorption are proportional in each pixel (single material object), 
this can be generalized to [26, 28, 29] 


CTF, vy, F) =2 [sin [= (v3 + 3) | + Ë cos [= (v2 + »3)]} . 81) 


Once absorption becomes important, the cosine term is non-negligible including a 
scaling with the ratio E, i.e. the ratio between energy dependent absorbing and phase 
shifting properties of the sample [28, 29]. Note also that spatial frequencies (vx, vy) 
are now expressed in natural dimensionless units, by v = ka/2rr with a the pixel 
size. In this way, the dependence on the Fresnel number F = = is highlighted. The 
spacing of discrete sampling points in reciprocal space is Av = 1/N, with N being 
the number of pixels in horizontal or vertical direction. An illustration is provided in 
Fig. 2.6, showing holograms simulated for different distances and magnification in 


cone-beam geometry (a), with corresponding phase CTF, (only the sine part of the 
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CTF) for the three indicated positions or equivalently F (b). Holograms for the three 
object positions are shown in (c-e). When the Fresnel number is large, the image 
contrast is dominated by edges (edge enhancement). For example, the cyan curve 
in (b) is maximal at high spatial frequencies and hence edges in (e) are enhanced 
with respect to areas. Shifting the sample closer to the source (light blue and dark 
blue curves in (b) and holograms in (c—d) shows oscillating contrast for lower image 
frequencies than in (e). 


2.3 Solving the Phase Problem in the Holographic Regime 


Consider holographic intensity distributions as shown in Fig. 2.6. If phase and ampli- 
tude were directly measurable by a detector, the complex wavefield could be numer- 
ically back-propagated to obtain the wavefield in the exit plane of the object, where 
the phase represents a sharp image of the projected electron density. Measurement 
of the phase is of course completely impossible as the frequency of hard X-rays 
is on the order of 10!8 Hz. However, the intensity pattern in the detection plane is 
directly related to the phase in the exit plane. Unfortunately, the corresponding set 
of equations is way too large to be solved directly, as it equals to the number of 
pixels. Furthermore, the interference between primary and scattered wave and the 
self-interference of the scattered wave render the equations non-linear. Finally, given 
a single detector image, the system of equations is under-determined, because the 
amplitude and phase maps in the exit plane contain twice as many unknowns as the 
available (real-valued) intensity map in the detection plane. The first concern of holo- 
graphic imaging must therefore be to design the experiment such that the measured 
data is sufficient, i.e. to achieve uniqueness. Several detector images with sufficient 
diversity in the data can be generated by variation of the Fresnel number, but in 
practice a second strategy is more common: The number of unknowns is reduced 
when sufficient constraints can be formulated, restricting the solution. For example 
when the phase map can be set be zero outside a known support (support constraint), 
or when absorption in the object is negligible and the amplitude can be assumed 
to be one everywhere (pure phase contrast constraint), or finally when the object is 
approximated to have identical stoichiometry resulting in a coupling of phase and 
absorption (single material constraint), the phase problem becomes manageable. 

Once this first concern of sufficient data and constraints is met, the second concern 
is to decode the holographic images and retrieve the phase map, i.e. the solution of 
the phase problem as an inverse problem. This phase retrieval process requires know- 
ledge about image formation (i.e. the forward problem based on the wave equation), 
including all experimental parameters, as well as about the constraints which can be 
formulated based on properties of the object. In the following, two basic approaches of 
phase retrieval will be introduced: Deterministic single step and iterative algorithms 
using alternating projections onto constraint sets. 
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2.3.1 Single-Step Phase Retrieval 


Single-step deterministic reconstruction techniques invert the process of holographic 
image formation under certain approximations. An example is phase retrieval based 
on the contrast transfer function (CTF) given in (2.81). In cases where the Fourier 
transform of a hologram can be expressed as a product of the object function and the 
oscillating CTF 


FIG, E DJ/IC, 3 0) = 1] (x, Vy, z) = CTF(vx, Vy, F) . F lg, oS 2)] (vr, Vy, A), 

(2.82) 
a simple ansatz is to divide the Fourier transformed intensities by the analytically 
determined CTF [28]: 


FUC, 5 z)/I(, E 0) =; 
CTF«-,-, F) 


g(x,y, A) = | 1 (x, y, A). (2.83) 


This separation firstly requires one of the following three constraints: pure phase 
contrast, pure amplitude contrast, or single material (coupled contrast). Secondly, 
an analytical expression for the CTF is only possible by linearization of the optical 
properties of the sample. Note that the original assumption of a weak phase shift in the 
derivation of the CTF was later lifted to a weakly varying phase [27]. Furthermore, 
even if the assumptions are justified, proper implementation of CTF-based phase 
retrieval requires regularization in order to compensate for the zero-crossings of the 
CTF [26, 28, 29]. A detailed description on the implementation of the CTF formalism 
including the extension to multi-distances can be found in [32]. 


2.3.2 Iterative Phase Retrieval 


Analytic, single step reconstruction techniques are quick but come at a price: They 
require very restrictive assumptions or approximations, as detailed above. The appli- 
cability of phase retrieval can be significantly extended by iterative methods which 
are computationally expensive but compatible with a wider range of constraints and 
valid for more general X-ray optical properties of the object. Iterative algorithms 
cycle between object and detector plane, and are alternatively subjected to an object 
constraint and the constraint that the solution has to satisfy the measured data. In the 
following, some of the most common constraints will be introduced. 

Compatibility with the measured data is assured by the so-called magnitude 
constraint: A solution to the phase problem must satisfy the measured intensity 
distribution of the hologram / (x, y). To this end, the wavefield w(x, y) is modified 
such that 


2 Coherent X-ray Imaging 61 


.— T TIA A ma [Pv] (x, 9) A A 
[Puy] (x, y) := [D-4] (x, y), y(x, Y) = eves 
(2.84) 


Note that in order to formulate phase retrieval only in terms of projections and 
to treat all constraints on equal footing, the Fresnel propagation D, of the wavefield 
between object and detector (and back) is incorporated into the projection operator. 

Next, we need at least a second constraint, notably in the object plane (or more 
precisely the exit plane). A very general constraint is the range constraint, which 
sets the magnitude behind the object strictly equal to one (after normalization to 
incoming intensity). In other words, one assumes the object to be of pure phase 
contrast. In a more general setting one solely requires the amplitude to be smaller 
than one (i.e. thereby fixing the range of the amplitude). This is justified, since right 
behind the object, no interference effects have yet evolved, the wavefield amplitudes 
can only be smaller (absorbing components in the object) than or equal (transparent 
components) to unity 


1 if |w(x, y)| > 1 


[Paw] (x, y) := Hess) eRe 


(2.85) 


However, some caution is advised. Application of the range constraint requires 
normalization of intensity by an independent measurement of the empty beam inten- 
sity and not just normalization by the mean intensity. 

Another straightforward but not always suitable constraint is the support con- 
straint. The reconstructed sample is only allowed to cover a limited part D of the 
field of view 


W(x, y) if(@,y)eD 
0/1 else (Fourier/Fresnel imaging). 


[Psy] (x, y) := (2.86) 


In phase retrieval, this constraint is very powerful, for example in view of recov- 
ering spatial frequencies corresponding to zeros in the CTF [33]. Unfortunately, it is 
not applicable to extended samples. 

A compact support can be regarded as a special case of sparsity. Obviously, the 
pixels with non-zero density are sparse, if the support is small. More generally, the 
object may be sparse in very different ways, i.e. the object or its projection may be 
specified by a set of independent values much smaller than the number of pixels 
(voxels). Sparsity constraints enforce image properties to be sparse in some sense, 
without being too restrictive and specific. A suitable way to enforce sparsity for 
X-ray holography is the shearlet constraint [34]. Shearlets are deformed (scaled, 
translated and sheared) wavelet-type basis functions [35, 36]. They are particularly 
useful to represent so-called cartoon-like images (compactly supported and twice 
continuously differentiable functions) [37-39]. Hence, in case that amplitude and 
phase of the object’s transmission function can be categorized as cartoon-like, a sparse 
representation by a linear combination of shearlets is possible, and this information 


62 T. Salditt and A.-L. Robisch 


can be used as an additional constraint for phase retrieval [34]. Yet the applicability 
of the shearlet constraint is not limited to cartoon-like objects, but the set of shearlets 
required is much smaller in this case, and the constraint can therefore be formulated 
more ‘strictly’, and will thus be more powerful for phase retrieval. 

If no reasonable constraint can be formulated for the object, i.e. if the object is 
extended, exhibits uncoupled variations in phase and amplitude and is not sparse, 
additional data has to be acquired and used as input for phase retrieval. One way to 
do this is by translations of the object or detector (either longitudinal or lateral) and 
successive image acquisitions. Alternating projections on such multi-measurements 
by various update schemes are denoted as multi-magnitude projection (mmp) [30, 
40, 41]. In mmp, the illumination (probe) has to be perfect or known beforehand, 
for example by a complete mmp series acquired at different detector positions [40]. 
This is often difficult to accomplish. 

A suitable alternative is to generate multi-measurements by translation of the 
object and to use ptychographic algorithms for phasing by enforcing separability of 
object and probe afterwards. This so-called separability constraint enables simul- 
taneous phase retrieval of object and probe, and can thereby account for the fact that 
aberrations inherent in the illumination interfere with the modulations imposed by 
the sample, which otherwise result in degradation of image quality and resolution 
[30, 31]. The separability constraint is key in all ptychographic algorithms [42—44], 
and was introduced in X-ray holography in several different ways, based on using 
either a wavefront diffuser and lateral translations [45], or longitudinal translations 
[46], or a combination of lateral and longitudinal translations [47]. Note that the 
last case is least restrictive in terms of probe properties, see also [48] for a detailed 
comparison and discussion. 

Figure 2.7 presents a schematic of an iterative phase retrieval algorithm. It is com- 
posed of alternating projections and reflections onto sets of functions that fulfill two 
or more constrains. The final goal is to ultimately decode the holographic intensities 
and reveal a solution proportional to the electron density distribution of the sample. 
The precise update scheme of a specific reconstruction algorithm (step 6 in Fig. 2.7), 
has significant effect on the convergence. As an illustration, three common update 
techniques will be briefly introduced. 

The oldest such schemes are known as Gerchberg-Saxton-type (GS) algorithms 
[49] and consist of alternating projections onto sets of functions fulfilling magnitude 
or range-constraints as given in (2.84). Holograms M, (x, y) and M2(x, y) recorded 
at two different defocus positions provide the required data, but a single measurement 
can be sufficient, for example in case of pure phase contrast. The complete algorithm 
can be written as 


wx, y) = [Pu Pm Y] Œ, y). (2.87) 


Alternating projections are stopped, once a fixed point solution yt" (x, y) = 
vr (x, y) is found. In case of a pure phase object, the range constraint for the object 
domain is replaced by (2.85). It can be shown that the GS algorithm corresponds to a 
local optimization, which is therefore highly dependent on the initial guess. A very 
similar algorithm based on the iterative application of two projections is the Error 
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Fig. 2.7 Illustration of iterative phase retrieval: An initially guessed wavefield y exiting the object 
of interest (1.) is forced to fulfill constraints right behind the sample (2.), has to fulfill the wave 
equation (3. and 5.), has to match the measured intensities (4.) and is updated to form a new guessed 


field y (6.) 


Reduction Algorithm which alternates the magnitude (measurement) constraint and 
a support constraint. 

To avoid stagnation in side minima, a more general update scheme can be for- 
mulated, with non-local search properties. The first such example was provided by 
the Hybrid-Input-Output algorithm, which was originally formulated for far field 
coherent diffractive imaging and objects limited to a finite support [50]. The main 
idea is to formulate the update as a linear combination of the wavefield of the previous 
iterate (the input), and the wavefield resulting from projection onto the measurement 
and application of a support constraint (the output). This linear combination is gov- 
erned by the parameter £ € [0, 1], gradually pulling the wavefield outside D to zero 


[Puy] œ, y) if œ, y)€ D 


(n+1) _ 
ye Gy) = Y (x, y) — B[Puw™] (x,y) else. 


(2.88) 


As a consequence, the solution w+) (x, y) does neither exactly fulfill the magni- 
tude constraint, nor does it fulfill the support constraint. Yet, it can overcome stagna- 
tion and finally find a solution that is consistent with both, support and range. Whereas 
the classical HIO algorithm was designed for far field imaging (i.e. the propagated 
wavefields are calculated by Fourier transforms), a modified Hybrid-Input-Output 
algorithm was formulated for full field holography [33]. The differences to its orig- 
inal version are the replacement of the Fourier transforms by the Fresnel near field 
propagator and the fact that both phase and amplitude of the wavefield outside the 
support are subjected to a support constraint [33] pulling the amplitudes to one and 
the phases to zero. 
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The Relaxed-Averaged-Alternating-Reflections (RAAR) algorithm [51] com- 
bines projections P and reflections R = 2 - P — 1. It is designed such that 


vw" Mx, y) = (= (RoRm + 1d) + (1 — in) Pu) y” (x,y). (2.89) 


The parameter A, is a relaxation parameter. As above the index M refers to projec- 
tions/reflections on the set of functions reproducing the measured intensities, whereas 
the subscript O indicates operations fulfilling constraints in the object domain. These 
can be range constraints, sparsity/shearlet constraints, separability constraints or sup- 
port constraints. For more details on this algorithm see Chaps. 6 and 23. 

Figure 2.8 depicts the results of three different phase retrieval methods applied to 
the simulated holographic intensities in (a). The phantom shown in (b) consists of 
spheres with radii between 50 and 200 nm of different materials (Al, Al203, Ca) inside 
a volume. The incoming illumination was simulated as a plane wave with photon 
energy of 7keV. The fact that the spheres are not made of a single material violates 
the assumptions required to derive single step CTF-based phase retrieval (2.79). 
Hence, the reconstruction shown in (d) depicts blurry regions (absorbing and phase 
shifting components ß and ô were set to the mean values of the given materials). The 
iterative modified Hybrid-Input-Output (e) and the Relaxed-Averaged-Alternating- 
Reflections algorithm can reveal features of the projected phase more distinctly and 
with less blur. A detailed comparison between iterative and analytic phase retrieval 
for experimental data can be found in [52]. 
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Fig. 2.8 a Simulated holographic intensities of a sample consisting of multi-material spheres. 
b True projected phases of the sample. c Line profiles through the phantom and the reconstruc- 
tions shown in (d-f). d Reconstructed phase by the CTF-based single step technique. e Phase 
retrieval result by the modified Hybrid-Input-Output algorithm (mHIO). f Phase retrieval result by 
the Relaxed-Averaged-Alternating-Reflections (RAAR) algorithm 
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2.4 From Two to Three Dimensions: Tomography 
and Phase Retrieval 


Inverting the holographic intensity distributions via phase retrieval techniques, one 
arrives at a two dimensional (2d) image proportional to the projected index variation, 
or equivalently—when considering the real part only—the projected electron density. 
But for a three dimensional (3d) object f(x, y, z), we would also like to find the 3d 
structure as given by the refractive index n(r). For this purpose, many projections of 
the sample have to be recorded under different angles, see Fig.2.9a. Projected gray 
values onto a single line s under an angle 0 shown in Fig. 2.9b correspond to 


RL) (s) = f FR) Spirae (T - ng — 8)d?x, (2.90) 


where r is a vector in a 2d hyper-plane of the 3d volume, ôpirac is the Dirac delta 
distribution and ng is the unit length vector pointing in direction of s. The operator 
R [f (r)] (s) is denoted as the Radon transform. It results in the 1d projection of the 
sample onto the line s. Figure 2.9b illustrates the discrete version of (2.90) (integral 
is replaced by sum). It shows the top view of a selected slice (x y-plane) through the 
sample f (gray shaded region). The operator R [f (r)] (s) computes line integrals 
through the object: Whenever the scalar product r- ng equals a distinct value s 
(here: Sa and sp) corresponding to the projection of r (here illustrated by ay, an+m, 
bn, bn+m) onto ng, the value of the object f(r) contributes to the projected gray 
value in a specific bin of the detector (here: the gray values at Sa and sp). 

There are different techniques to invert the Radon transform, i.e. to reconstruct 
f(r) from a set of projections. The most popular and widely used method is the 
so-called filtered back-projection. Its basic ingredients are a Fourier-filtering step 
of the projection data by a ramp filter and the back-projection or smearing out of 
the filtered projection values along straight lines equal to the paths of the line inte- 
grals of the Radon transform. In-depth literature about tomography can be found in 
[14, 53, 54]. Finally, Fig. 2.9c summarizes the basic steps of three-dimensional holo- 
graphic X-ray imaging (holographic tomography): A small sample is rotated in a 
(partially) coherent beam (see Fig. 2.6a), and for each angle a holographic intensity 
distribution is recorded. Each of the holograms needs to be processed by a suitable 
phase retrieval technique resulting in an image proportional to the object’s projected 
electron density. Finally, the 3d object is computed using the information collected 
by all of the projections. 

A different approach is to combine phase retrieval and tomographic reconstruc- 
tion. Instead of performing phase retrieval of all projections first and in a second 
step the inverse Radon transform (e.g. filtered back-projection), these two steps can 
be intertwined iteratively: Iterative Reprojection Phase retrieval (IRP) [55]. In 
this way, an iterate of the full 3d object exists at all times during the process, which 
ensures tomographic consistency of all projections. This was found to facilitate phase 
retrieval, effectively acting as a constraint of its own. 
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Fig. 2.9 From two to three dimensions: a Projections of a sample (consisting of spheres) are 
recorded under different angles. This can be realized by either rotating the sample or the detector. 
b Illustration of (2.90): The delta-distribution in combination with the scalar product r - ng selects 
points with a component of length s in direction of nọ. These point lie on lines pointing to the 
detector under the angle 0. The summed values of f (shaded region) along these lines are recorded 
by the detector. Note that (b) illustrates a single plane/slice through the three dimensional object 
perpendicular to the axis of rotation—see outermost right sketch in (c) for clarity. c Holographic 
X-ray microscopy is used for high resolution tomography. Holographic intensity distributions are 
collected under different angles. Each single projection is processed by phase retrieval algorithms. 
The reconstructed phases are used for inversion of the Radon transform, i.e. for three dimensional 
tomographic reconstruction 


Since IRP is computationally very involved, another simultaneous operation of 
tomographic reconstruction and phase retrieval by CTF was proposed, which also 
couples the two previously sequential operations [56]. It relies on propagation of the 
transmission function of an entire 3d object—the propagated object—and requires 
linearization of the object’s optical properties. If this approximation is justified, the 
concept of propagating the entire 3d object in parallel is extremely useful for holo- 
tomography, an shall be briefly introduced here, following [56]. 

To this end, let us consider a plane wave Wo along the optical axis z impinging on 
a 3d object, parameterized by the spatial distribution of the index of refraction n(r). 
If the projection approximation holds, the exit wave (x, y, z = 0) is determined 
by the projection of the index of refraction onto a plane perpendicular to the optical 
axis, which can be written in terms of the Radon operator R(n — 1) := f (n — 1)dz 
as 
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W(x, y, 2 = 0) = Wo, y) exp likoR(n — 1)] 
WR, y) [1+ ikoR(™m — 1], (2.91) 


where the approximation of an optically weak object with |koR(n — 1)| < 1 was 
used to linearize the exponential function. Following the angular spectrum approach 
in Sect.2.1.2, the free-space propagation to the detector plane at distance z (using 
the propagator P;) can be formulated as 


w(x, y, 2) = [Fog Pe Poat C, 0)] Œ, y, z) 
~ Wo(x, y) exp [ikoz] + ikoWo(x, y) [Fog Poa, +» 2)Foa R(n — 1)] (x, y), 
(2.92) 


with the radially symmetric free-space propagator function Pg(ky, ky, z) = 
exp(i z[k — k? — ol 2) (Fourier space coordinates k,, ky and wavenumber kp = 


21/2), followed by a 2d Fourier back-transformation Foy . Next, we consider the 2d 
Fourier transform of a projection of the object, e.g. along z, which can be rewritten 
as 


Fog [R(n - 1)] = I (n — 1) dz - exp[-i(k,x + kyy)] dx dy 
= [Faa(n — Deo: (2.93) 


From (2.93), it can be inferred that the central slice of the 3d Fourier transform of 
the object (given by n — 1), equals the 2d Fourier transform of a projection normal 
to the slice. This is an important geometric relation in tomography, known as the 
Fourier slice theorem. Here we use it to show that the order of projection and 
propagation D can be inverted [56]. We consider the 2d propagation D24 of the 
projection R(n — 1) [56] 


Dog R(n — 1) = Fag {Pra - Foa IR, y, 2) — DI) 
= Fa { Poa : (Fa (n — Dk.=0} 
= Fyj {Psa + Foa In — 1] - 8piraclkz)} 
= Fa {[ Psa - Fa (n — Deo} 
= Fag {Fea [R (Fyi (Psa : Fa (n — D))]} 
= R [D34 (n — 1)]. (2.94) 


The term P34 = exp [i AJ kè — k? — k= k | is the 3d propagator function. As 
detailed in [56], this result is the starting point to perform CTF approximations and 
fast iterative phase retrieval directly in 3d, i.e. simultaneously with tomographic 
reconstruction. Within a few iterations, tomographic consistency can be enforced 
and serves as constraint for phase retrieval. 
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Chapter 3 A) 
X-ray Focusing and Optics get 


Tim Salditt and Markus Osterhoff 


Es ist aber leicht einzusehen, daß bei nahezu streifender 
Inzidenz der Röntgenstrahlen im Fallen < 1 eine nachweisbare 
Totalreflexion auftreten muß. 

— Albert Einstein, 21st March, 1918 


3.1 General Aspects of X-ray Optics and Focusing 


X-ray optics can be considered as optics in the “vacuum limit”. In fact, the index 
of refraction n = 1 — 6 + i asymptotically approaches one for high photon energy 
E, asô and 8 decrease algebraically for E > E,, where E, stands for an atomic 
resonance, i.e. an absorption edge given by the corresponding electronic binding 
energy. For all materials, the X-ray regime is hence characterized by extremely small 
differences in the indices of refraction. This can be a blessing in terms of penetration 
power, or the validity of various approximations, such as kinematic scattering (neglect 
of multiple scattering) or the projection approximation, as addressed in Chap. 2. At 
the same time it can also be a curse, as one readily realizes the challenge of focusing 
radiation when the index difference between a lens and air or vaccuum goes to 
zero. More generally, not only focusing but any type of optical element and function 
is heavily constrained by the small differences in the index of refraction. For this 
reason, it is not yet possible to focus down to X-ray wavelength A. In Abbe’s sense, 
the diffraction limit is not in A but in the achievable numerical aperture. In other 
words, the diffraction limit for X-rays is a limit of the diffraction structure. This has 
raised the question of a fundamental resolution limit for X-rays existing above the 
wavelength. In 2003, Bergemann and van der Veen had conjectured a fundamental 
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Fig. 3.1 Illustration of the rapid progress in X-ray nano-focusing, following the advent of 3rd 
generation synchrotron radiation. Major benchmarks are displayed as full symbols (2d focusing) 
or empty symbols (1d focusing), representing the following references (from left to right): FZP 
optics (brown circles) [6-9], CRL (black downwards-pointed triangles) [10-15], KB mirror optics 
[16-22], MLL/MZP (purple diamonds) [4, 23-28], and WG or KB/WG compound optics (orange 
triangles) [29-33], respectively. The arrows mark results obtained by the CRC ‘Nanoscale Photonic 
Imaging’ in Göttingen 


length scale A given by the decay length of evanescent waves, which should present 
a lower limit for any X-ray focus size well above the wavelength X. This critical 
length prototypically appears in waveguide optics in terms of the minimum width 
to which a mode can be confined, as explained further below, and roughly ranges in 
between 8 and 15 nm depending on the density of the material used for focusing [1]. 
This limit was later rejected and successively disproved [2, 3]. Yet, the idea is correct 
that the small contrast in the index of refraction for X-rays and the correspondingly 
large decay length of evanescent waves indeed significantly constrain our focusing 
capabilities. More than fifteen years after the postulation of the “Bergemann limit” 
we must realize that we still do not know the minimum focus size nor the maximum 
local field enhancement (gain) in focusing X-ray radiation. Experimentally, however, 
10 nm focal size has become a reality also for hard X-rays [4, 5]. Figure 3.1 illustrates 
the rapid development of hard X-ray focusing over the last two decades. This progress 
became possible after the advent of high brilliance (3rd generation) synchrotron 
radiation, which provide the necessary coherence for diffraction-limited or near- 
diffraction limited focusing. 

The primary challenge for X-ray microscopy is hence to narrow down the gap 
between the theoretical resolution limit associated with the wavelength À and the 
actual resolution limited by the optical systems. Fresnel zone plates (FZP) have been 
developed as X-ray focusing and objective lenses by G. Schmahl and colleagues 
in Göttingen, initially for X-ray microscopy in the soft X-ray range (0.2-1.2 keV, 
A = 1-7 nm). Spot sizes of soft X-ray microscopes of around 30 nm are common; 
“best values” are about two times smaller, in the range 10-15 nm [34]. Hard X-ray 
zone-plate optics was for a long time limited to above ~0.25 um, but over the last fif- 
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teen years, significant progress as been achieved by several advanced concepts which 
realize FZP optics based on multilayer deposition on planar solids as well as on thin 
wires. Such structures are denoted as multilayer Laue lense (MLL) and multilayer 
zone plates (MZP), respectively, and will be discussed in detail in Sect. 3.5. Progress 
within the present collaborative research center (CRC Nanoscale Photonic Imag- 
ing), in particular, has resulted in 5 nm point focusing by a MZP optics. Compared to 
diffractive optics, refractive optics as used for visible and UV light seem at the first 
glance, unsuitable due to the small X-ray refractive index, with 6 ranging in the order 
of 1075 for hard X-rays. To realize refraction comparable to that of lenses for visible 
light, a multitude of lenses must be lined up; this concept of compound refractive 
lenses (CRL) was invented in the 1990s by A. Snigirev and B. Lengeler at the Euro- 
pean Synchrotron Radiation Facility (ESRF) in Grenoble [15], and has been thriving 
since. Today, CRLs made out of Beryllium are found almost at every synchrotron 
beamline. For nano-focusing, CRLs fabricated by electron beam (e-beam) lithog- 
raphy in silicon have been developed by C. Schroer, and reach spot sizes down to 
50nm [14]. Next to diffractive and refractive optics, reflective optics can be imple- 
mented for hard X-rays, taking advantage of grazing-incidence total reflection or 
multilayer-constructive reflection. Since long, curved mirrors have been appreciated 
as high efficiency and non-dispersive focusing elements for synchrotron radiation. 
In the 1990s, with advent of 3rd generation synchrotron sources, mirror-based optics 
reached spot sizes in the range of 1—5 um. With novel polishing tools for highly 
curved mirrors developed by the group of K. Yamauchi in Osaka [35], and alterna- 
tively of Kirkpatrick-Baez (KB) mirrors with adaptive bending as implemented by 
O. Hignette at ESRF [18], sub-100 nm focusing became available ten years ago. At 
the same time, first compound optics with two-stage focusing or collimation was 
implemented for hard X-rays. Using a combination of high gain KB mirrors and 
X-ray waveguide optics, a 25 x 47 nm? exit beam with clean background and high 
degree of coherence was demonstrated in [30]. In the course of subsequent research 
within CRC Nanoscale Photonic Imaging, waveguide optics has been significantly 
improved, and point focusing down to 10nm (in the exit plane of the waveguide) 
is now possible. At the same time efficiency has also been significantly improved. 
As a result, X-ray micro- and nanofocussing can be implemented today by either 
diffractive (example: Fresnel zone plates), reflective (examples: Kirckpatrick-Baez 
mirror, waveguides) and refractive optical elements (example: compound refractive 
lenses), and/or combinations thereof. X-ray optics and in particular nanofocusing has 
been an enabling tool to extend X-ray microscopy over the recent years, in spectral 
range, in resolution and in contrast mechanism. This is true not only for the classi- 
cal full-field scheme of transmission X-ray microscopy (TXM) which is based on 
objective zone plates, or scanning X-ray transmission microscopy (STXM), but also 
for coherent diffraction imaging (CDI) and holography, which also take advantage 
of X-ray focusing, even if the resolution limits are no longer limited by the focal 
size. Figure 3.2 illustrates the rapid development of hard X-ray focusing over the last 
two decades, following the advent of high brilliance (3rd generation) synchrotron 
radiation, which had provided the necessary coherence for diffraction-limited or 
near-diffraction limited focusing. 
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In this chapter, we give an introduction into reflective and diffractive X-ray optics, 
to provide basic knowledge for further chapters. For refractive optics, we refer to 
the excellent reviews in [36]. In Sect. 3.2, we first present the basics of X-ray reflec- 
tivity, followed by a section on mirrors (Sect.3.3) and X-ray waveguides (Sect. 3.4). 
Section 3.5 then presents Fresnel zone plate (FZP) optics, and Sect. 3.6 an introduc- 
tion to coherence. We close by briefly addressing compound optics and different 
variants of X-ray microscopes (Sect. 3.7). 


3.2 X-ray Reflectivity and Reflective X-ray Optics 


In Chap. 2 we have justified the use of scalar wave theory in the hard X-ray spectral 
range. Therefore, also Fresnel reflectivity can be accounted for simply by considering 
the boundary conditions of a scalar wave w at interfaces of layered materials. More 
generally, one has to differentiate between the different polarisation states. The scalar 
approximation holds, since the decrements of the index of refraction 6 and 8 are much 
smaller than unity, and only small angles (much smaller than the Brewster angle) are 
relevant in X-ray reflectivity. In fact, small-angle approximation is also warranted 
in most cases. There are excellent treatments of X-ray reflectivity [37-39]. In this 
section, we follow the derivation presented in the textbook ‘Elements of modern 
X-ray physics’ by Als-Nielsen and McMorrow [37]. 


3.2.1 X-ray Reflectivity of an Ideal Single Interface 


Consider a scalar wave with wave vector k; and an amplitude a7, impinging from 
vacuum onto a semi-infinite medium with a sharp interface. The reflected wave 
is denoted by kr and ap, and the transmitted wave by kr and ar. As boundary 
conditions we require the wave w and its derivative Vw to be continuous at the 
interface between the two media (Fig. 3.3) 
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Fig. 3.3 An incoming wave 
br = aret" at an incident 
angle a is partly reflected 
under the same angle a 
forming the wave 

wr = age’ and partly 
transmitted under an angle 
a’ forming a transmitted 


wave wr = arelkrr 
following [37] 
ar + adr = ar (3.1) 
and 
ark; + arKr = arkr. (3.2) 


The wave number is k = |k; g| in vacuum, and nk = |kr| in the medium. Con- 
sidering the components of the wave vector parallel and perpendicular to the surface 
yields 

ark cosa + ark cosa = ar nk cosa’ (3.3) 


and 
— (ar — ag)k sina = —ap nk sin a’. (3.4) 


From the above equations, Snell’s law is obtained 
cosa = n cos a’. (3.5) 


Approximating the cosines for small angles using cosa = 1 — @?/2, and n = 
1 — ô + i p, one finds 


a =a? +26 —2i8 
=a? +a —2iß. (3.6) 
Here, a? = 26 denotes the critical angle of total external reflection from the 


optically thicker (here: vacuum) to the optically thinner medium. Using (3.1) and 
(3.4), we have 


a; — ar sna a 
i X, (3.7) 
ar tar sin & Q 
With 
l-r a 


= — (3.8) 
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and 1 +r = t, this leads to the Fresnel equations 


dre a-da 


ME. 3.9 
ay ata’ a 
BE 1 S 2 
fe eg ee. ah 
ay a+ a a+ oa a + a 


where r denotes the amplitude reflectivity and 7 the amplitude transmission function. 
The intensity reflectivity is expressed by 


2 
; (3.11) 


a— a q-q 


qt+q' 


Rp = 


| 


ata’ 


where q = 2k sin a and q’ = 2k sin a’ denote the momentum transfer, which is 
always vertical to the interface. The reflectivity as a function of g is unity up to 
the critical wave vector qe = 2k//26 (discarding absorption) and then decreases 
algebraically with g~*. This characteristic makes X-ray reflectometry a powerful tool 
for the study of surfaces and interfaces of materials, since weak signals of interface 
disturbances can interfere with this “carrier wave”, such that the signal of a single 
atomic layer becomes observable. The transmission function T = |t|? increases from 
zero tO gc, Where it reaches a maximum of four (discarding absorption), and then 
decreases again to unity for q > qc. The propagation angle in the medium a’ is a 
complex number, which can hence be decomposed into 


a’ = Re (a) +ilm (a’) , (3.12) 
Accordingly, the transmitted wave can be expressed by 
arel”? = ap eRe) ermo), (3.13) 
Hence, the intensity falls off with a 1/e penetration depth A given by 


1 1 
— 2kIm(a’) ge Im(q')’ 


(3.14) 


Below the critical angle, the real term is zero and the wave is purely evanescent 
with a decay length which goes to 1/gq. for œ < as. This localisation of intensity to 
the immediate sub-surface region is exploited in grazing incidence diffraction (GID) 
[40], and grazing incidence small-angle scattering (GISAXS) [41] (Fig. 3.4). 
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Fig. 3.4 Fresnel reflectivity R and transmission T as a function of momentum transfer q, expressed 
in natural (dimensionless) unit q /qc 


3.2.2 Multiple Interfaces and Multilayers 


Let us first consider reflectivity in case of a sample with one layer above the substrate, 
still following [37]. In the following, no is the index of refraction of vacuum, n; the 
index of refraction of the layer and nz the index of refraction of the substrate. In 
contrast to the case of reflection from a pure substrate, there is now a series of 
possible reflections: 


(i) Firstly, reflection at the interface 0 to 1 (interface vacuum/layer), amplitude 
reflectivity is ro1. 

(ii) Secondly, transmission at the interface 0 to 1, fo, then reflection at the interface 
1 to 2, rı2, followed by transmission at the interface 1 to 0, tio. By adding this 
wave to the above, it is necessary to include the phase factor p? = e'74, where 
A is the thickness of the layer. 

(iii) Thirdly, transmission at the interface 0 to 1, fo;, then reflection at the interface 
1 to 2, rı2, followed by reflection at the interface 1 to 0, rıo, then another 
reflection at the interface 1 to 2, r12, finally followed by transmission 1 to 0, 
tio. The total phase factor for this wave is pt. 


Hence, the total amplitude reflectivity is 


2 2 4 2.3 6 
Player = Foı + toitiori2 p^ + toitiofio’j2P + forloirjorpP + --- 
2 2) 4242 „4 
= roi + toitiori2p [1+ rıorı2p +rornmP +...] 


CO 
2 2 
= roi + toitiori2p ) (riorı2p”)” 
m=0 


1 


To = a ae 
1 — riori2p? 


= ro + fortiori” (3.15) 


where the geometrical series has been used in the last line. Using the definitions of 
r and r, as presented in the previous subsection, we obtain 


= 2 2 29, 2 2 
(qo a 9929] = SEE =i (3.16) 
(qo +41) (qo+q1) (qo + 41) 


ro. + foitio = 
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with ro, = —r10. Inserting this expression into the equation of rjayer leads to 


ro. + rp? 
aver = —— M, 3.17 
Flayer 1+ roırı2p? ( ) 


The equation for the reflectivity of a thin layer (layer thickness A) can be further 
simplified for the case of identical materials on either side of the layer. In this case, 
roı = —rı2 holds and (3.17) is simplified to 


roi (1 — p°) 


en, (3.18) 
1- rip? 


Flayer = 


While the above equation is exact, further approximation can be performed when 
considering an angular range where refraction can be neglected (angle sufficiently 
large compared to critical angle). In this case |rgı| « 1 (q > 1), and the amplitude 
reflectivity r(q) can be written as 


2 
de 
rq) ~ (£) (3.19) 
2q 
Using these assumptions, the amplitude reflectivity of a thin layer becomes 
ra (l — p’) 2 de . igA 
Taye = — > Vd py) |- | de) (3.20) 
1 — rp? 2q 


This can be rewritten as 


Player = ee ore ee) (3.21) 
q 
l6mpr,A ei44/2 elf A/2 _ e 94/2 
= i - (3.22) 
2q 2(44/2) 2i 
Be ArproA sin(q A /2) eid (3.23) 
q qA/2 


As the equation is supposed to describe the properties of a thin layer (layer 
thickness A), we assume gA < 1, which results in 


AroryA Karl 
a ng (3.24) 


Tthin layer © ; i 
thin layer q sin(a) 


using 


4TA 
q=- ; (3.25) 
sin(a) 
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1 vacuum ny 


Fig. 3.5 Reflectivity of a stratified medium (e.g. multi-layerd thin film), with reflected and trans- 
mitted beams, as required for the Parratt algorithm. The reflectivity is calculated recursively from 
bottom to top. Each layer is parameterized with thickness ¢;, roughness øj, and complex-valued 
index of refraction n j. Profiles which are not step-wise constant can be approximated by a sequence 
of sufficiently thin slabs. Illustration following [39] 


The expression for the reflectivity of a thin layer in (3.24) is known as the kine- 
matical reflectivity. Note that this equation only holds for angles sufficiently above 
the critical angle. 

Next, we consider multiple layers (N layers) on top of an infinitely thick substrate, 
still following [37]. The reflectivity can be calculated using the Parratt algorithm [42], 
which is based on recursion. By definition, the Nth layer is on top of the substrate 
(see Fig. 3.5). The z-component of the wave vector, k; j; in the layer denoted j is 
determined by the wave vector k; and its x-component k,,;, which is conserved 
through all layers k,,; = kx: 


kj = (njk)’ — k? = (1 — ô; + iB)? ~ k? — 25;k° +2iß;k?. (3.26) 


The wave vector in the jth layer yields 


qi = 4? — 8K26; + i8K2B;. 3.27) 


Ina first step, the reflectivity is calculated for the interface of the Nth layer/substrate 
yielding 
ye = 4N = fs 


s : (3.28) 
i an + fs 


Note that no multiple reflections have to be taken into account, since the substrate is 
assumed to be infinitely thick. Then, the reflectivity at the interface Nth layer/N — Ith 
is considered, which can be written as 


f f 2 
Ty—1,N + Ty,sPN 


1 1 2° 
1+ ry_inlN,sPN 


(3.29) 


'N-1,N = 
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where the reflectivity expression for a single layer has been used. Here ri j+1 denotes 
the reflectivity at the respective interface without considering multiple reflections, 
given by 


qj = Ij+1 


; (3.30) 
qj +qj+ 


f 
Fj j+ = 


Next, the reflectivity at the interface of layers (N — 1) and (N — 2) is calculated 
using 


1 1 2 
In-2,n-ı + Fn,n-ıPN-ı 


(3.31) 


FN-2 N = : 
N-2,N ES ” 3 
N-2,N-1"N-1,NPN-1 


This procedure of determining the respective reflectivities can be repeated until 
the total reflectivity amplitude ro; at the top interface 1st layer/vacuum is obtained. 
This iterative solution is the basis of reflectivity codes such as IMD by Windt [43]. 
Note that not only the intensity reflectivity as shown here, but the full fields inside the 
structure can be computed, by this or equivalent matrix methods with field vectors 
in each layer and boundary conditions at the interfaces taken into account. Typical 
reflectivity curves of periodic multilayers exhibit strong multilayer Bragg peaks, as 
well as total thickness oscillations known as Kiessig fringes. They reflect the inter- 
ference of the reflected waves from the vacuum/layer and layer/substrate interfaces. 
From the period of these oscillations, the thickness of the layer can be determined. 


3.2.3 Interfacial Roughness 


Generalizing the results obtained for sharp or flat interface, where the density profile 
along z can be described by a step function, we now consider interfacial roughness, 
still following [37]. For real materials, we need to model a graded or rough interface. 
In this case, the density profile at the interface has to be modified. The density profile 
as a function of depth z can now be better described by an error function. Accordingly, 
the reflectivity of an ideal flat interface, which is given by (3.11), is modified in case 
of a rough interface by 


R= Rpe E, (3.32) 


We can derive this expression the following way (see [37]): First, we model the 
density profile of the interface by a function p(z) which fulfills p(z) > 1 forz > © 
and p(z) > 0 for z— —00 (see Fig. 3.6). Most commonly, p(z) will be the error 
function (see below). Now we consider the contribution ôr (qz) to the amplitude 
reflectivity r (q) from an infinitesimal thin slab at depth z 
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q2 

ôr (qz) = —i—p(z) dz (3.33) 
4q 


and integrate over all infinitesimal thin layers to obtain the amplitude reflectivity 
r(q) as a superposition 


oo 
2 
r(g) = a J p) eT dz. (3.34) 
4q 
—00 
Using partial integration, we find 
2 00 
1 dp .. 
r(qy=—i— | P araz. (3.35) 
ig 4q dz 


With 3.19 (limit of a perfect interface, q >> 1), this yields 


r(q) =rr(q) $ (q), (3.36) 
using the definition 
fd 
D(q) = i oP ei de, (3.37) 
dz 
—00 


The function @(q) describes the structure of the interface in reciprocal space 
(when modeling ®(r) with an error function in real space, as described below, its 
derivative dp/dz will have the form of a Gaussian). The reflectivity (intensity) R(q), 
as measured in an experiment, is described by the so-called master formula [37] 


2 oo 
d ; 
z f ( e) = j 
-œ \ dz 


which not only holds for a profile broadened by roughness, but more generally for 
any structured interface profile, within kinematic approximation. A common choice 
for the density profile of the interface p(z) is the error function erf(z) (see Fig. 3.6): 


Fig. 3.6 A vertically Z 
smeared out density profile 

p(z) is used to model a rough 

interface. Illustration 

following [39] 


2 


RU) = (3.38) 


Reg) 


r(q) 
rr(q) 
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pl) = art(—=) (3.39) 


The parameter o gives a measure for the width of the graded region of the interface. 
This smeared out density profile can be regarded as an averaging of the rough surface. 
The derivative of the error function, dp/dz, is a Gaussian: 


(3.40) 


Hence we obtain 


WG). J Ws tees I © (ey as 
= | Zoe u a edz. (3.41) 


By definition, the right hand side of (3.41) is the Fourier transform of dp(z)/dz. 
By computing the integral for the Gaussian case, we obtain 


2 
—q?o? 


RO = aor. (3.42) 


Rr) 


r(q) 
rr(g) 


We can now discern two cases 


(i) g;0 > 1, the surface is optically rough 
Gi) qz0 < 1, the surface is optically flat. 


Therefore, X-ray reflectivity can be used to quantify the roughness of a surface or 
interface. More importantly in the present context, mirror roughness severely affects 
the focusing intensity and field distribution. 


3.3 X-ray Mirrors 


Reflective optics in form of planar and curved mirrors are indispensable tools for 
synchrotron radiation science. Mirrors are encountered in almost every beamline 
for rejection of harmonics, which would also fullfill the Bragg condition of the 
monochromator. At fixed grazing angles of incidence a;, higher harmonics impinge 
above their critical angle a,, and are hence only very weakly reflected, while the 
fundamental is still below its critical angle and hence has a reflectivity r close to one. 
Mirror optics are also often preferred as the first optical element to take the white 
synchrotron beam, since a large surface area under grazing incidence can be used for 
cooling. In many beamlines, mirrors with moderate curvature are used to focus the 
beam to the desired position in the experimental hutch, in particular in the horizontal 
direction where the divergence is large. However, this type of focusing with large 
mirrors and large focal distances are designed for focal beam sizes of a few mm. 
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Contrarily, micro- or nano-focusing for X-ray microscopy requires much shorter 
focal distances and much smaller radii of curvature. The most common arrangement 
of focusing mirrors for this purpose at synchrotron and FEL facilities is known as 
the Kirkpatrick-Baez (KB) mirror system, which we discuss in this section. Two 
major properties apply to KB focusing as to mirror optics in general: Firstly, it is 
non-dispersive, and hence well suited for broad bandpass or photon energy variation. 
Secondly, the efficiency is high since r ~ 1 for a < œe. 


3.3.1 Kirkpatrick-Baez Geometry 


A KB system consists of two crossed elliptically shaped mirrors [44], as sketched 
in Fig. 3.7. The mirror length is typically a factor of ten shorter than the large 
beamline mirrors, often around 10cm. The mirror surface is polished to an elliptical 
shape. The ellipse is designed to have the first focal point at the radiation source, for 
example at the undulator exit, and the second at the focal plane of the experiment 
(sample position). Since ellipsoidal surfaces with two principle planes of curvature 
are difficult to fabricate, the two mirrors are elliptically curved only in one plane and 
are assembled perpendicular to each other. Rays are sequentially reflected off this 
orthogonal mirror pair, emulating a 3d ellipsoidal mirror surface. 

In the design of a KB system, the following requirements must be considered. 
The mirrors must have 


e a suitable reflectivity—so the grazing angle of incidence a is bounded by the 
critical angle ag ~ mrad; 

e a homogeneous phase of the reflected beam—so a well-shaped mirror with negli- 
gible figure errors to minimise aberrations; 


HFM 


to focus from source 


> > 


95 mm 


Fig. 3.7 Geometry of KB focusing. A vertically (VFM) and horizontally focusing mirror (HFM), 
each with elliptical shape function, are aligned behind each other in orthogonal planes. Orthogonality 
and Bragg angles must be carefully aligned. Fixed curvature by polishing of the substrate and/or 
adaptively curved mirrors are both common 
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Fig. 3.8 Example of an elliptical KB mirror profile, with the surface (blue) polished to elliptical 
shape as a function of the position on the mirror (x-axis). The mirror is positioned at about 87 m 
behind the undulator at the P10 beamline of the PETRAII storage ring [45], and has a focal length 
of about 300 mm. The deviations from the perfect height profile are also shown (red curve) and have 
a rms roughness of only o ~ 0.1 nm. The mirror is made of SiO2 and coated with Rh 


e a well-polished surface—to reduce scattering which leads to artefacts for example 
in holographic imaging. 


The first point limits the numerical aperture (NA) of reflective optics; since 
the critical angle scales linearly with X-ray wavelength, the achievable resolution 
A/d. ~ 10 nm is approximately constant with photon energy, and only depends on 
the material. This length scale would then be just one example of the more general 
limit postulated by Bergemann et al. for all kinds of X-ray focussing [1], as discussed 
in the introduction. The second and third points have been solved by technological 
progress. An important break-through has been achieved by the group of Yamauchi, 
by the development of the elastic emission machining (EEM) [35], which enabled 
the fabrication of elliptical surfaces with sub-nm figure errors and few Å roughness, 
even for mirror lengths of 100mm and longer. As an example of this technology, 
Fig. 3.8 shows the height profile and deviations for the horizonally focusing mirror 
(HFM) of the GINIX instrument at the P10 beamline of the PETRAIII storage ring 
[45]. 

Geometrically, the elliptical shape yields a perfect point focus, providing a con- 
stant and real-valued reflectivity along the active surface. However, under total reflec- 
tion, an angle-dependent phase-shift (a) occurs. From the Fresnel reflectivity for- 

a 2 with a’ = cos™! (cos a/n) € i R for a < a. and n < 1, we obtain 


mula r = ar 
—/26 — sin? 2) 


as phase shift y(a) 
: (3.43) 
sin a 


(a) = 2tan-! ( 
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where a varies along the mirror’s surface [46]. This phase gradient, Viyp(q), leads to 
a small shift of the beam. It is connected to the Goos-von Hänchen effect. Although 
totally reflected, an evanescent wave enters the medium to experience a small phase- 
lag. Numerically, this lateral shift of the focal spot is only on the order of a few nm. In 
addition, the index of refraction n = 1 — 6 + i 8 has an imaginary part due to absorp- 
tion. By ways of this imaginary component, the angle a’ changes slightly, yielding 
a second phase contribution to y(a). Again, the effect on the lateral position of the 
focal spot is in the nm range. The spot size, however, is unaffected. Hence, albeit the 
evanescent wave and absorption of the reflecting material, an elliptically shaped mir- 
ror operating under total external reflection provides efficient point-to-point focus- 
ing. However, since a; < a, must be fulfilled for all points on the reflecting surface, 
the numerical aperture is quite limited. To overcome this limitation without severe 
reduction in r, multilayer (ML) coatings are used. 


3.3.2 Multilayer Mirrors 


For “simple” mirrors based on total external reflection, the numerical aperture is lim- 
ited by the critical angle d. ~ 4 mrad for hard X-rays and typical coating materials, 
e.g. at 14keV and Rh coating. Hence, also the focal spot size has a lower limit of 
about 50nm, if we pose reasonable bounds on all other geometrical properties. To 
enhance the reflectivity at higher angles of incidence, multilayer coatings with alter- 
nating high and low density layers are applied. As known from planar multilayers, 
the first Bragg peak assures high reflectivity at angles of incidence which can be 
easily a factor of ten higher than a<, depending on the multilayer period A. Common 
materials for hard X-rays are e.g. W, Mo, Ta for the high density layers, and B4C, C 
or Si for the low density layers. For a KB system, one expects that these layers and 
the substrate must follow the shape functions of conformal ellipses, with the X-ray 
source (undulator) and focal spot as the two focal points. However, due to refraction 
inside the multilayer structure, the layer shapes need to be slightly modified and var- 
ied across the mirror surface [47, 48]. Using such multilayer mirrors with a laterally 
graded layer period, it was for the first time possible to “Break the 10 nm barrier in 
hard X-ray focusing” [5]. 

In order to design optimal multilayer mirrors, e.g. for the upgraded beamline 
ID 16a at the ESRF, an analytical treatment of dynamic X-ray diffraction inside such 
a graded multilayer structure in elliptical geometry has been developed in [49, 50]. 
Here we briefly describe this wave-optical theory of nano-focusing X-ray multilayer 
mirrors based on the Takagi-Taupin theory of strained crystals. The geometry and 
system of coordinates is shown in Fig.3.9. As a natural choice, we use elliptical 
coordinates (t, s) given by 


ro-+r rg—r 
= —__ s := — 


; ; 3.44 
5 (3.44) 
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(a) Confocal ellipses (b) Elliptical coordinates 
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Fig. 3.9 a Multilayer coatings of KB mirrors approximately follow the shape functions of confocal 
ellipses. b Definition of the elliptical coordinates used for the Tagaki-Taupin theory 


where ro is the distance of a point from the source S, and r; is the distance of this 
point to the focus F. We derive the Takagi-Taupin (TT) equations from the Helmholtz 
equation of a scalar field ~(s, t), here written not with the index of refraction n, but 
with the susceptibility y = n? — 1: 


Vs, +k [14+ xE, DIYE, 1) =0. (3.45) 


For a (quasi-)periodic structure, we write the susceptibility as a truncated Fourier 
series to first order 


x(s,t) = Xo + xi exp (—2ikt) + xı exp (?ikt) . (3.46) 


Then, we decompose the field ~ into two components: the incoming wave 
wo exp (ikro) diverging from the source, and the reflected wave Y exp (—ikr,) con- 
verging to the focus. To re-write the Helmholtz equation, we need the folowing 
expressions: 


w(s,t) = poexp (ik(s + t)] + Yı exp lik(s — t)] (3.47) 

xv ~% (Xo%o + xiy) exp [ik(s + 1)] (3.48) 

+ (xovr + x70) exp lik(s — 1] (3.49) 

Vid(s,t) = END + nen (3.50) 
2 2 

ls, t) = [> = cosd, (s, t) = sin’) (3.51) 
t4— sS 


Here, V is the local angle of incidence, and 2c the distance between source and 
focus. 

Assuming slowly varying envelopes, V?yo,1 (s, t) ~ 0, and defining u, := kxn/2, 
we obtain 


(ad, + 80o = i (uopo + undı) — Yo/2(t + 5), (3.52) 
(að, — PODY = iluodı + uzy) - Y1/2 — 8) . (3.53) 
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With a, 3 = const, these Takagi-Taupin equations are valid in the flat case; here, 
these coefficients are dependent on coordinates as given above. When applied to 
curved multilayer mirrors, the bilayer period A® of the stacked system following 
Bragg’s law is given by 


a A A 
~ 2sind(s,t) 28° 


(3.54) 


Now we take refraction of the X-ray beam due to the average index of refraction 
inside the ML structure into account. The modified Bragg condition then reads 


A A 


AmB = N 
Zn? —cos?0 2/28 


(3.55) 


where ô = (ô; + 62)/2 is the average decrement, assuming equal thicknesses of the 
bilayers. For Jp > 30. ~ 3/26, a good approximation is given by 


AB x (14+ 6/87) AP. (3.56) 


The increased layer thickness is accounted for by using a pseudo-Fourier series 
of x(s, t), in which the exponentials are modified according to exp (42ikt) > 
exp (+2ikt(1 — 6/9). Replacing further 7, by Yi := Yı exp (-2iktö/ BP), the 
modified propagation constant uo in the second TT-equation is replaced by 


— 


Uy S uy — (Psp — PAY)k, p=lt-16/P. (3.57) 


Assuming a constant J, it can be shown that u, = —uġ; in other words, while 
the first TT equation gives rise to a phase-lag due to refraction; the modified second 
equation now yields an anti-phase lag of the reflected wave, in fact correcting for 
refraction. In the curved case, the next-to-leading order term reads 


a? At 


u, ~ uo +2ôk| 1 — 2———— 
oa | Ba +5) 


| : At := t — tọ, (3.58) 


with tf = to along the entrance surface. For realistic parameters, this curvature term 
leads to a small numerical correction on the required bi-layer spacing. 

Reflectivity curves in dynamical diffraction are not symmetric; in particular, the 
peak intensity does not occur at the nominal Bragg angle. For further numerical 
optimisation, a scaling factor f interpolating between Bragg and modified Bragg 
layer spacing is now introduced; we define 


ACF) := AP + f x AM HA), Fer (3.59) 


The TT system of coupled differential equations is solved numerically and for 
different parameters f. Figure 3.10 summarises a simulation of a ML mirror for the 
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Fig. 3.10 Simulated focus sizes for the geometry of the ESRF KB nano-focus upgrade beamline: 
a horizontal ML and b vertical ML, as a function of Bragg modification factor f. The focus sizes 
(red curve with reddish error bands) have been obtained from sinc?-fits to the intensity in the focus. 
Standard deviation of reflected phase (along the ML surface) is shown in green on a logarithmic 
scale; peak intensity in the focus is shown in blue. Dotted line shows the diffraction limit. Simulation 
has been carried out for a point-source. From [51] 


ESRF beamline ID-16a. Both the focal spot size A (red line), peak intensity in the 
focus (blue points), and the standard deviation of the reflected phase (green dashes) 
is shown as a function of optimisation parameter f. Based on the simulations, a 
value of f = 0.9 yields the best results, and a theoretical focal spot size of about 
5nm (FWHM). 


3.4 X-ray Waveguides 


Compared to other spectral domains, notably that of visible and infrared light, waveg- 
uide optics is much less developed in the X-ray range. Total reflection in a thin film 
of low electron density surrounded by high electron density is the basis for guid- 
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Fig. 3.11 Basic schematic of using a waveguide exit beam for holographic X-ray imaging. By 
selecting the waveguide to sample distance, objects can be imaged in a full field configuration and 
geometrically magnified holograms of nanoscale objects can be recorded. The exit field behind the 
sample is reconstructed by phase retrieval algorithms, which are often more robust in this regime, 
compared to far-field coherent diffractive imaging 


ing X-ray radiation. The first waveguiding effects for X-rays used propagation in 
planar (straight) thin film structures [33, 52-56], followed by the development of 
two-dimensional channel waveguides [30, 57], which have posed significant fab- 
rication challenges up to recently [58-60]. Progress in waveguide fabrication has 
led to a usable exit flux outpassing in some case 10° photons per second [58]. If 
optimized for small beam size, X-ray waveguides with beam confinement of sub- 
10nm (FWHM) have also been demonstrated [32]. In the context of this volume, the 
use of X-ray waveguides to create a monochromatic and fully coherent secondary 
quasi-point source is of particular importance. This coherent point source is ideally 
suited for X-ray holography and coherent imaging techniques. This has resulted in 
(holographic) propagation imaging at unprecedented resolution and image quality 
[61]. Figure 3.11 illustrates the basic geometry of using X-ray waveguides to record 
in-line X-ray holograms. 


3.4.1 Waveguide Modes: The Basics 


X-ray waveguides can be treated as a special case of the general theory of electro- 
magnetic waveguides, as presented in the classic textbook by Marcuse [62], which 
we follow here. The only particularities associated with waveguiding in the X-ray 
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regime derive from the nature of the X-ray index of refraction and the short wave- 
length. Adapted to the X-ray case, notation and parameterisations used here, closely 
follow previous original work presented in [54, 57, 63]. We start from Maxwell’s 
equations, written down for an isotropic, linear, nonconducting, and nonmagnetic 
medium as 


OB 
Mn (3.60) 
öD 
V = — .61 
xH ET (3.61) 
vD=0, (3.62) 
v-B=0, (3.63) 


where E and H denote the electric and magnetic field, B = poH the magnetic 
induction, and D = eye€ = con? (r)E the electric displacement. We then take the 
curl of (3.60) 


V x (V x £) = ml x H) WS? — poon? 3E (3.64) 


and use V x (V x £) = V (V - £) — VE to obtain the source-free (homogeneous) 
wave equation 


V?E — meon’ 0E = 0, (3.65) 


which holds for V - € = 0. This is warranted for section-wise constant index n, or 
in the approximation of a weakly varying index, since in this case we can neglect 
V . E ~ 0, as can be verified from 


0=V:-D=eo(Vn?)-E+eon?’V-E Deon’V-.E. (3.66) 
Analogous to (3.65), we can also obtain the wave equation for the magnetic field 
VH = -meon 0H . (3.67) 


Next we consider a solution of Maxwell’s equation which has the particular form 
of a guided wave, i.e. 


E=E) fk”, (3.68) 
H =Hir,) ek”, (3.69) 


where w is the angular frequency, k = ke, the wave vector with magnitude (wave 
number) 27/A, and r} = r — (ex - r)ex is a position vector perpendicular to k. In 
other words, in a guided mode we require the field to be stationary with respect to the 
propagation axis, i.e. € and H are only functions of the coordinates perpendicular to 
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n,=1-6 


Fig. 3.12 Sketch of a simple slab waveguide geometry. The guiding layer material with high 
index of refraction is sandwiched by a low index surrounding material (“cladding”). For X-rays, 
air/vacuum with nı * 1 is an ideal guiding medium, and the index of refraction of the metal or 
semiconductor cladding is given by n2 = 1 — ô 


the propagation axis. Unless they are constant (plane wave), this requires the presence 
of matter. More precisely, a distribution of the index of refraction n(r |) with the 
translational symmetry along k, which is chosen such that the field is guided along 
the propagation axis while being confined in the orthogonal direction(s). A simple 
example is sketched in Fig.3.12, with z as the propagation axis, and a stepwise 
constant index of refraction profile n(x) 


nif —d/2 <x <d/2, 


(3.70) 
n) else, 


n(x) = 


describing a simple planar waveguide geometry with guiding layer of refractive 
index nı and thickness d (guiding core) sandwiched between two semi-infinite 
cladding regions of refractive index na < nı. The profile function n(x) parameterizes 
a planar waveguide with one-dimensional beam confinement (1DWG), while two- 
dimensional confinement would require a corresponding two-dimensional profile 
function n(x, y), describing for example a channel waveguide (2DWG) of cylindri- 
cal or rectangular cross section. For the given geometry of a planar waveguide, we 
hence have 


Beier, (3.71) 
Hj = Eee, (3.72) 
Inserting this ansatz in (3.60) and (3.61) yields six differential equations for the 


field components, out of which two sets are uncoupled, describing the transverse 
electric (TE) modes (modes without electric component in propagation direction) 


IDE, = —iwpo Hx, (3.73) 
O, Ey = -iwuoH,;, (3.74) 
—iBH, — 0,H, = iweon?(x)E,, (3.75) 
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and the transverse magnetic (TM) modes (modes without magnetic component in 
propagation direction) 


—iPE, — 0,E, = —iwpoHy, (3.76) 
iBH, = iweon?(x)Ex, (3.77) 
O, Hy = iweon(x)E; . (3.78) 


For TE modes E,, E,, H, = 0 (for TM modes E,, H,, H, = 0). Expressing the 
field components along x and z by those in y, we obtain for the two sets of modes 


PE, + (k’n?(x) — B°) Ey = 0, (3.79) 
H, + (k’n’(x) — 8°) Hy =0. (3.80) 


These equations are sometimes denoted as reduced wave equations. For X-rays, 
the propagation is extremely forward directed, i.e. the internal reflections angles 
are on the order of a few mrad, much smaller than the Brewster angle. For this 
reason, the TM and TE solutions degenerate, and scalar diffraction theory holds. 
Correspondingly, it is sufficient to consider a single scalar field 7. In fact, we could 
also start directly from the scalar wave equation 


2 
Vy - “ay =0 Vw € {Ey, Ey, Ez, Hy, Hy, Hz}, (3.81) 


1 
. . or . . . 
in mind that the components are in general not independent. For forward directed 


propagation of X-rays, the scalar wave equation (3.81) is, an excellent approximation. 
The field Y can be written as a superposition of monochromatic fields X, (spectral 
decomposition) 


with c= the speed of light, and n = ‚/E, 1, the refractive index, keeping 


Y = yr, t) = L f > Ypo) "dw, (3.82) 
2r 0 


if stationary quasi-monochromatic waves are considered, i.e. Y > yw, and n > ny. 
In this case time dependence is harmonic 


wr, t) = Une", uryeC, (3.83) 


and we can write down a differential equation only for the complex amplitude U (r) 
by inserting (3.83) in (3.82) 


2 
(VUE) eo + Zu Un) et = 0 (3.84) 
> VU(r) + n° kK U(r) = 0. (3.85) 
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Equation (3.85) is the scalar Helmholtz equation (HE). Here, the wave number 
in vacuum is given by kp = z = =, Another notation is k(r) = n(r)ko where k(r) 
is the wave number in a medium. To simplify the notation, we will write k for ko 
in the following, and refer k to the absolute value of the wave number in vacuum. 
The component k; will be denoted as k; = 3. Again, we use a separation ansatz for 
guided modes 


U(x, z) = u(x) e, (3.86) 


where ß is called the propagation constant. Insertion in (3.85) then yields the one- 
dimensional reduced wave equation for u(x), 


2 
zul) + (n°(x)k? — Bu) = 0, (3.87) 


which has the same form as (3.80). Hence we can either first work with Maxwell’s 
equation and then use scalar approximation later in the reduced Helmholz equation, 
or start from the scalar wave equation, and arrive at the same result. Note that in 
order to have ( real, we have to assume the refractive index to be real and thus at 
least initially ignore absorption. More generally, the modes will also be affected by 
the imaginary part of the index, but in practice it is sufficient to treat absorption 
a posteriori by an effective (weighted) absorption coefficient for each mode. Even 
though this does not matter in scalar approximation, u(x) could be taken to represent 
the horizontal component of the electric field, considering the so-called transverse 
electric (TE) modes of the waveguide. This requires u(x) to be continuous at the 
interfaces. Furthermore, for guided modes we require the field to vanish far inside the 
cladding, i.e., u(x) must approach zero in the limit |x| — oo. For symmetric potential 
functions (here: index of refraction profiles), the eigenfunctions (modes) have defined 
parity, i.e. are symmetric or anti-symmetric. A general form of a symmetric function 
which solves 3.87 and which does not diverge, is given by 


Acos(kx) if 0 < |x| <d/2 


3.88 
Cell if |x| > d/2 0 


u(x) = 


where A and C are constants. Requiring u(x) and its derivative u’(x) to be continuous 
functions at x = +d/2, we get 


Acos(kd/2) = Ce”, (3.89) 
—Ak sin(kd/2) = —yC ei? . (3.90) 


Dividing (3.90) by (3.89), we obtain a transcendental equation 


d 
k tan (=$) =y (3.91) 
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Symmetric modes are found by solving this transcendental equation. For anti- 
symmetric modes 


Bsi if <d/2 
TrA ae 0 < |x| < d/2, (3.92) 
C el if |x| > d/2, 
we have correspondingly 
d 
— «cot (=$) =Y. (3.93) 
Using the definitions 
Kd d 
€:= a k?n? — p? 3 (3.94) 


and 


V := [n — nk kd ~ J 262 — 261 kd (3.95) 


the transcendental equation can be rewritten as 


y2 
étan(£) = ,/ ao ¿2 (3.96) 
y2 
— E cot(£) = y z & (3.97) 


for antisymmetric modes, respectively. The transcendental equation determines a 
discrete set of modes £m, with 0 < m < N — 1. The total number of guided modes 
N is given by 


for symmetric modes and 


V 
N= 4 ; (3.98) 


where [] denotes the Gauss bracket (rounding to the next integer). The recipe to com- 
pute a mode, is then to solve the transcendental equation, and to compute the param- 
eters in sequence Em — Km > Bm —> Ym, to obtain u(x). The smallest £o which 
solves (3.96) determines the fundamental mode 


(x) COS(KgX) if |x| <d/2 
ug(x) = 
cos(kod /2)e—"*|-4/2) else. 
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Fig. 3.13 Graphical representation of the transcendental mode equation with the waveguide param- 
eter V. Each intersection corresponds to a mode. Symmetric modes are indicated in blue, anti- 
symmetric modes by in orange. For given V = 8.44, the waveguide supports N = £] = 3 modes 


Fig. 3.14 Mode amplitudes (left) and intensities (right), corresponding to the solution of the tran- 
scendental equation shown in Fig. 3.13. The cladding is shaded in gray 


In order to interpret a mode in a geometric optical picture it is helpful to consider 
the complete field in the guiding layer, e.g. of asymmetric mode, with mode envelope 
u(x) = cos(Kx) (Fig. 3.14) 


n| > 


U(x, z) = u(x) ei et Binz) — (gerbe) + RE) . (3.99) 


The right hand side corresponds to two internally reflected plane waves (guided 
by total reflection), or beams in the geometric optical model, with wave vectors K 


ki =] O ], (3.100) 


as sketched in Fig. 3.17. 


3.4.2 Coupling and Propagation 


The modes um(z) are eigenfunctions of the waveguide potential. For a rectangular 
profile they consist of a sine or cosine term with m + 1 antinodes in the guiding layer, 
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Fig. 3.15 a Multi-modal propagation inside a planar waveguide with thin film sequence Ge/Mo 
[di = 30nm]/C [d = 35nm]/Mo[d; = 30nm]/Ge, simulated for 17.5 keV and front coupling (plane 
wave). The intensity distribution is plotted in the range of 221-261 1m behind the waveguide 
entrance, showing a mode beating along the propagation direction z. b Intensity profiles corre- 
sponding to the dashed lines in a illustrating the interference due to multi-modal propagation. ce A 
Fourier transformation with respect to z reveals both the shape of the guided modes (vertical pro- 
file), and the propagation constant (proportional to the horizontal offset). d FWHM of the simulated 
near-field distribution (top) and far-field distribution (bottom) as a function of z. Adapted from [32] 


and an exponentially decaying evanescent wave in the cladding, as derived above. 
For more general potential shapes, the mode function can also be found numerically 
by integration via Numerov’s method (shooting method) [64]. For given geometry 
and boundary conditions, propagation can be calculated by finite difference (FD) 
calculations as presented in Chap. 2, and the different modes can be dissected by 
means of Fourier transformation along z, see Fig. 3.15. Neglectingmodes and the 
corresponding interference effects is well described by linear combination of all N 
guided modes 


Mmax 


P, z) = X Cmim (x) exp (~i bnz) - (3.101) 


m=0 


In front-coupled waveguides, the coefficient cm is given by an overlap integral of 
the incident field Yin and um [29, 52] 


co = oe | on dx. 
za 


Absorption can be accounted for by a factor exp (—/tefr,m) in the right hand side 
of equation (3.101), with an “effective linear absorption coefficient” Heff,n given by 
a mode-weighted average of the absorption coefficient profile u(x) [65] 
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Fig. 3.16 Field intensity distribution (logarithmic color code) in a a nearly mono-modal, b a multi- 
modal air/silicon waveguide, simulated for plane wave incoming 8keV radiation with unit intensity. 
Due to higher absorption of the m = 1 mode, only the fundamental mode m = 0 persists in the 
d = 16nm guide, while N = 5 modes propagate in the d = 96nm guide. c By tapering the exit 
intensity can be increased, and single mode radiation is achieved at the exit. The intensity gain 
between a and ¢ for same exit width is directly visible 


1 


Heff,m = 
[uml]? 


f lün (Œ) P (x) dx. 


For a vacuum guide, only the intensity fraction in the cladding contributes to the 
absorption of the mode. The transition from multi-modal to mono-modal regime as 
a function of guiding layer thickness d is illustrated by Fig. 3.16a, b. Note that d can 
also be tapered along the optical axis as in (c) to concentrate the field. Instead of 
coupling from the side, a beam can also be coupled in through the cladding, via the 
so-called resonant beam coupler (RBC) geometry, see Fig. 3.17a. In this case, modes 
can be excited selectively, even if the waveguide support multiple modes. Figure 3.17 
also shows a simulation depicting the position of a waveguide in the focal plane of a 
KB-mirror. By computing the propagation for different incoming realisations of the 
(stochastic) field, the guiding and filtering of a waveguide can be studied [66]. 


3.4.3 Fabrication and Characterisation of X-ray Waveguides 


To isolate a guided X-ray beam with a cross section down to about 10 nm close to the 
fundamental limit [1], long channels are needed with aspect ratios (length to width) 
in the range of 10—106, depending on the photon energy E and cross section d. 
This is because the radiation entering at the sides of the over-illuminated channel 
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Fig. 3.17 Sketch of different coupling geometries. a Resonant beam coupling (RBC). The waveg- 
uide is illuminated by an incoming plane wave under grazing incidence with a; tuned to a mode. 


Modes can be exited in an interval af < a; < a@ladd, where af and a@!2dd denote the critical 


angles of total reflection for waveguide core and cladding, respectively. The mode is exited via 
an evanescent wave in the top cladding. b Front coupling scheme, illustrated in a ray-optical pic- 
ture, with the mode formed by up and down reflected rays in the guiding core, according to (3.100). 
c Finite difference simulation in coupling a pre-focused beam (KB mirrors) into a silicon-air waveg- 
uide, propagation and out coupling, adapted from [66] 


entrance (radiative modes) has to be absorbed in the cladding material. Not only the 
small cross section, but also the high aspect ratios impose a significant challenge in 
fabrication. Waveguide structures for one-dimensional beam confinement by planar 
waveguides (1DWG) are easily obtained by thin film deposition techniques, but most 
applications require two-dimensional waveguides (2DWG). Using guiding channels 
of polymer structured by e-beam lithography and coated with metal or semiconductor 
cladding, 2DWGs were first realized in [57] and later improved by [30]. An alter- 
native fabrication scheme based on dry etching of channels into silicon wafers and 
subsequent capping by wafer bonding makes it possible to employ an empty guiding 
core (air or vacuum) and hence to minimize absorption in particular for lower photon 
energies [60]. This has enabled a waveguide exit flux on the order of 108 ph/s (P10 
beamline of the PETRA III storage ring of DESY [67]. 

Figure 3.18 illustrates the fabrication of waveguide channels in silicon by e-beam 
lithography and subsequent wafer bonding, according to [60]. A spin-coated poly- 
methyl-methacrylate (PMMA) is used as positive e-beam resist. The desired pattern 
of an array of waveguide channels is written by moving an interferometric laser stage 
below a stationary electron beam, in order to achieve the required channel length (of 
a few mm’s) without stitching errors. The developed resist then provides the etching 
mask for pattern transfer into the semiconductor substrate by reactive ion etching 
(RIE). Subsequently, the mask is removed and the channels a capped by a second 
wafer via hydrophilic wafer bonding [60]. An alternative fabrication scheme has been 
demonstrated in [31], where two planar waveguides (1DWG), which each confine 
the beam in an orthogonal direction, were combined in a crossed geometry to form an 
effective two-dimensional quasi-point source for holographic imaging. This crossed 
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Fig. 3.18 Fabrication of lithographic waveguides. a Sketch of waveguide processing sequence: 
resist deposition, e-beam exposure, reactive ion etching, mask removal, and finally wafer bonding. 
b Schematic of air-filled channel capped by a top wafer bonded to the substrate. c, d, e SEM 
micrographs of waveguide channel entrances. f Photograph of waveguide chip as cut by the wafer 
dicing machine. From [60] 


two-dimensional waveguide (c2DWG) scheme is compatible with fabrication by 
thin layer deposition. Hence, smaller guiding layers, a wider range of materials, and 
more complex layer sequences can be realized, including a two-component cladding 
optimized for high transmission [68]. Using for example an interlayer made of Mo, 
placed between the guiding core (C), and a high absorption cladding (Ge), this 
scheme provides excellent waveguides for the photon energy range between the Ge 
L-edges and the Mo K-edge, see Fig. 3.19. Figure 3.20 shows the measured far-field 
pattern of a Mo/C/Mo c2DWG system with guiding layer thickness d = 35 nm. The 
far-field exhibits a relatively uniform intensity distribution in the center along with 
a characteristic arrangement of fringes in the tails. The large divergence reflects the 
small focal width of the waveguide as quantified reconstruction of the near-field 
intensity distribution by the error reduction (ER) algorithm [31]. The calculation 
of the field’s auto-correlation function by Fourier transformation of the far-field 
intensity can be used as a verification, since its width should give the value as the 
auto-correlation of the ER result. 


3.4.4 Advanced Waveguide Configurations 


Waveguide optics enables a variety of optical functions, such as filtering, confining, 
guiding, coupling or splitting of beams. Advanced X-ray waveguides now begin to 
exploit such advanced functionalities, beyond simple filtering the mode structure 
of a synchrotron beam, which is already well established. Based on an array of 
waveguide channels, X-ray optics on a chip has been proposed in [69]. Beam con- 
centration by tapering [58], guiding beams around a bent [69], and beam splitting 
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Fig. 3.19 Crossed planar waveguides. a Schematic. b Profiles Re(n) and Im(n) of the index of 
refraction n = 1 — ô + i p, for photon energy E = 17.5keV. Transmission of the guided modes 
is increased by the high ô but relatively low 3 of Mo. ce Scanning electron microscopy (SEM) 
image (magnification 52.85 kx) showing the Mo/C/Mo layers in between Ge. The In52Sn48 alloy 
serves as bond material to an additional Ge cap wafer. d SEM micrograph 200kx magnification. 
From [31] 
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Fig. 3.20 X-ray waveguide beam with cross section at around 10 nm. a Fraunhofer far-field diffrac- 
tion pattern of a beam exiting a crossed two-dimensional waveguide system (c2DWG) consisting 
of Mo/C/Mo layers, recorded with a pixel detector (Pilatus, Dectris Inc.) at a distance of about 5m 
behind the waveguide exit (E = 13.8keV, logarithmic scale, scale bar 0.02 AT! , 100s dwell time). 
The two orthogonal slices had a guiding layer thickness of 35 nm, and a thickness of / = 490 um 
(vertical slice: /; = 270 um, horizontal slice: l2 = 220 um). A maximum (output) photon flux of 
1.0 x 108 ph/s in the c2DWG beam was achieved by focusing a KB beam onto the waveguide 
entrance. b The near-field intensity distribution in the effective focal plane, obtained by inverting 
the diffraction pattern based on phase reconstruction by the error reduction (ER) algorithm (loga- 
rithmic scale, scale bar 20 nm). A high beam confinement in the effective confocal plane is achieved 
by multi-modal interference. e Line scans with corresponding Gaussian fits yield a FWHM of 10.7 
and 11.4nm in horizontal and vertical direction, respectively. Adapted from [32] 


for nano-interferometry [59], have also been demonstrated. In contrast to refrac- 
tive or diffractive optics, X-ray waveguides are non-dispersive and can thus support 
broader bandpass. An advanced fabrication scheme with improved lithography, etch- 
ing and wafer bonding steps has now paved the way to develop this field further [59]. 
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Fig. 3.21 Minitaturized beam splitter based on X-ray waveguides. a Finite differences simulation 
of a beam splitter. b Top view SEM image of a splitting structure before wafer bonding. Scale bar 
denotes 1 ym. c-f SEM images of the exit side of beam splitters with different spacings S. Scale 
bars denote 100 nm. g Schematic of the experimental geometry showing the coupling of the focused 
X-ray beam into the entrance of the beam splitter, the subsequent guiding in the two channels, the 
free space propagation behind the chip, and finally the far-field detector at a distance D. The far- 
field pattern shows the characteristic double slit interference pattern, modulated with features of 
the waveguide modal structure. Arrows mark bifurcations in the interference fringes (fork-shaped 
structures). Length and angles are not to scale. h Enlarged view of the interference pattern with a 
sinusodial fit to the intensity oscillations. i Scan in y-direction indicating the position of different 
beam splitters which have all been defined on the same chip with different geometric parameters, 
and which can be selected by translating the chip in the FZP focus. Detailed scan profile of a single 
channel with a width (FWHM) of 282.6 nm giving an upper limit for the beam size in the horizontal 
direction. From [59] 


Multiplexed beamlets can be particularly useful for coherent imaging [70], and pos- 
sibly also X-ray quantum optical experiments [71]. As an example, the function of 
a waveguide beam splitter is illustrated in Fig. 3.21. 

X-ray waveguides are also promising optical devices for the emerging field of 
ultra-fast X-ray optics at free electron laser (FEL) and higher harmonic generation 
(HHG) sources, since they support nearly dispersion-free pulse propagation down 
to ultra-short pulse width in the range of 0.1 fs [72]. FEL or HHG beam splitters 
with attosecond delay would be orders of magnitude smaller than macroscopic pulse 
delay stages. Spatial and temporal splitting of a pulse into two reflected beams, one 
displaced along the surface with respect to the other, can be also achieved by X-ray 
waveguides in resonant beam coupling geometry, based on a giant Goos-Hänchen 
effect [73]. As shown above for the stationary case, propagation is described by a 
finite number of guided modes, each with its own propagation constant and effective 
absorption index. The propagation of a short pulse is therefore governed by the 
effective dispersion and group velocity of the excited modes, which depend on the 
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Fig. 3.22 a, b, c Intensity distribution (envelope) of a 5 attosecond beam propagating at different 
distances z = 0.5mm (a), z ~% 2mm (b), z ~ 4mm (ec) in a silicon slab waveguide (air/vacuum 
guiding layer) of 100 nm diameter. The waveguide’s edges are indicated by black lines. From [72] 


derivatives of the effective refractive indices for each mode. However, since these 
differ only very slightly, X-ray waveguides can be considered as nearly dispersion 
free optics down to femtosecond pulses, while dispersion effects start to become 
visible in form of mode separation only for attosecond pulses [72]. An example 
of pulse propagation in X-ray waveguides is shown in Fig. 3.22. A 12keV pulse 
width of 5 attoseconds is simulated in a planar silicon (slab) waveguide with vacuum 
guiding layer of d = 100 nm. The modes separate spatially by a few nm after several 
mm of propagation distances. Even if the pulse spectrum covers an absorption edge 
of the cladding material, modal dispersion could would manifest itself only for a 
pulse width of 0.3 fs, according to simulations by time-dependent finite difference 
propagation in [72]. 


3.5 Diffractive Optics and Zone Plates 


In this section, we first recall the basic theory of Fresnel zone plate (FZP) optics, and 
then present different approaches of FZP fabrication. With the advent of improved 
fabrication techniques, smaller zones can now be achieved. However, this also 
required advanced optical design concepts and numerical methods for simulation, as 
presented in the last part of this section. Here we limit the discussion to the experi- 
mentally relevant case of binary zone plates, which are fabricated from two different 
materials; one of low and one of high density. The low density material can also be 
air or vacuum. 


3.5.1 Basic Theory of Fresnel Zone Plates 


We assume a plane wave of wavelength À propagating along the optical axis and 
impinging on a circular aperture. The wave shall be focused to a point a distance 
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f downstream the aperture. This focused wave is given as a sector of a spherical 
wave. The focus is formed by constructive interference of waves transmitted through 
rings around the optical axis, with radius R = f + nà/2, n € 2No. Rings with n € 
2No + 1, on the other hand, would interfere destructively. The rings (or annuli) of 
different n form the so-called Fresnel Zones. These zones form concentric circles 
with radii 


nr 


2 
rn = {nvf + (=) (3.102) 


For n < f/A = O(10’) for typical X-ray zone plates, the second term can be 
neglected. If now the “odd zones” with n € 2No + 1 are blocked out in the aperture, 
the remaining waves interfere constructively in the focal spot. By Babinet’s principle, 
blocking the “even zones” will lead to the same intensity. An optical device which 
focuses light by absorbing light from the opaque rings is called an absorbing Fresnel 
Zone Plate. By blocking light in some areas, a bright spot appears on the optical 
axis. Jean-Auguste Fresnel was the first to obtain this result from calculation, as an 
extension of the optical phenomenon of Arago’s spot. As straightforward calculation 
shows, however, the focusing efficiency of such an absorbing FZP is limited to 
1/7? ~ 10% only (Fig. 3.23). 

Proposed by Lord Rayleigh in 1888, and first demonstrated by Wood ten years 
later, phase-reversing zones increase the efficiency to 40%. Instead of absorbing 
every other zone by a thick material, a relative phase-shift of m is introduced. At 
hard X-ray energies of e.g. E = 12.4keV, it is challenging to achieve a full phase- 
shift of 7. For example, for iridium with n = 1-2.19 x 1075, an optical thickness of 
2.28 um would be required. We discuss fabrication techniques and their advantages 
in the next subsection. The efficiency in the general case of a mixed absorbing/phase 
shifting zone plate follows further below. 


Fig. 3.23 Schematic of a Fresnel zone plate (left) in the aperture plane, and (right) in a plane 
containing the optical axis. Different positive and negative diffraction orders m are obtained by 
positive interference. From [74] 
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Higher diffraction orders: Apart from the nominal focus at a distance f from the 
zone plate, higher orders at distances f/m, m € N, occur. Generalizing the zone 
plate law, the zone radii can be written as 


Af, nemNo. (3.103) 


Ifnow m = 2, thenn/m € No, and the neighbouring zones interfere destructively. 
So there is no focal spot at f/2 (in the thin zone plate approximation; the even spots 
can appear by volume effects, see later). This argument also holds for higher even 
numbers. For odd numbers, e.g. m = 3, the condition for constructive interference 
is partly fulfilled for most of the zones. Hence, higher-order focal spots at f/3, f/5 
etc. appear. 


Negative diffraction orders: In the binary zone plates constructed as above, spher- 
ical waves converge onto the focal spot (and its higher order siblings). Applying the 
symmetry operations of time-reversal and inversion, however, also diverging waves 
are supported by the condition of constructive interference. Apart from the positive 
focal spots at f/m, also “negative orders” virtually emerging from spots located 
along the optical axis at — f/m appear. These yield purely diverging waves. Usually, 
the negative and higher orders are blocked by a pinhole, the order sorting aperture 
(OSA). 


Efficiency: In 1974, J. Kirz has presented a thorough treatment of Fresnel zone 
plates for soft X-rays, including the case of imaging at finite distances. Also, instead 
of purely absorbing or phase-shifting zones, the general case for a material with n = 
1 — ô + i was studied. Introducing the ratio 7 := (3/6, and using Fresnel integrals, 
the intensity of the first pair of zones can be written as 


1 T 1 p Qn f 
I = |Ao + Aıl? = se Í eds + ee i ef S-2719/) dg (3.104) 
0 T 


= = (1 3 e7 4TPtAÀ EA BERNER) (3.105) 
T 
1 

= (1 +e”? — 2e™ cosy) , (3.106) 
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with wavelength A, optical thickness t and y := 2rtô/A. For all higher pairs of 
zones, the integrals evaluate to the same result, which can hence be regarded as the 
overall efficiency. We can deduce that even orders are not excited, and that higher 
(and negative) odd orders m are suppressed by a factor 1/m?. For optimal efficiency, 


Ol, 


0= 
Oy* 


a 1 +e 7" — 26" cos y*; (3.107) 


for 7 — 0, y* approaches 7. The optimal optical thickness r* can be calculated as 
t* = p* \/(270). 
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3.5.2 Fabrication Techniques 


X-ray microscopy has for long been limited by the difficulties of fabricating high 
resolution and high quality X-ray lenses, notably Fresnel zone plates. In fact, soft 
X-ray microscopy started with FZP fabrication by holographic laser illumination, 
pioneered at Institut für Röntgenphysik by G. Schmahl and colleagues in the 1960s 
and 1970s [76-78]. Subsequently, this fabrication technique was replaced by e-beam 
lithography in the 1980ies, achieving a lateral resolution which was no longer limited 
by visible light. The different steps of FZP fabrication by e-beam lithography are 
illustrated in Fig. 3.24. Major challenges were both in the writing process, e.g. a 
suitable pattern generator, write-field limitations, and interferometric positioning 
minimizing stitching errors, as well as in the structure transfer by reactive ion etching 
(RIE). Continuous efforts have pushed the limits towards the 10nm range for soft 
X-rays [34]. For hard X-rays, however, fabrication with larger aspect ratio (zone 
height to depth ratio) required to achieve the necessary phase shifts becomes much 
more demanding. Nevertheless, by seminal work of C. David and his group at the 
Paul-Scherrer-Institut diffractive optics is today also established in the hard X-ray 
regime. Special fabrication techniques such as zone-doubling have helped to increase 
the aspect ratio [79], and progress has cumulated in record focal spot sizes down to 
17nm (point focus) [80]. To push beyond these values, diffractive optics must be 
fabricated by thin film deposition and subsequent dicing. With magnetron sputtering 
(MS), for example, large thin films can be grown on a flat substrate. Two materials, 
one optically “thin” and one “thick”, can be deposited alternatively; this yields so- 
called multilayer Laue lenses (MLLs) of virtually unlimited size [24, 25, 81]. Tens 
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Fig. 3.24 Different steps in FZP fabrication by e-beam lithography. a A resist film is deposited 
by spin-coating. b The FZP pattern is written by the e-beam. c The illuminated resist is developed, 
leaving behind a pattern of circular trenches. d Metal (e.g. Ni) is grown by electrochemical methods. 
Electric conductivity is assured by the thin Au layer below. e The remaining resist is removed by 
solvent. From [75] 


(c) 
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of thousands of bi-layer can be deposited with high accuracy. The final lens is then 
prepared by cutting out a slice of desired optical thickness using a focused ion beam 
(FIB) facility. Two such lamellae can then be used in series to form atwo-dimensional 
focus. For a benchmark study with sub-10nm point focus, see [23]. 

Contrarily, thin film deposition on a wire is called multilayer zone plate 
(MZPs). This goes back to an old idea [82, 83], which was also first implemented by 
magnetron sputtering (MS) and subsequent dicing [84]. This sputter-slice technique, 
however, was in most cases hampered by cumulative roughness, and the dicing also 
introduced severe artifacts. Only in recent years, these difficulties could be overcome 
by use of pulsed laser deposition (PLD). By this approach, the group of U. Krebs 
in Gottingen demonstrated cumulative smoothening of roughness [85] and was able 
to grow smooth multilayers with ultrathin layers. Using a FIB, the final lamella 
can be precisely cut to the desired optical thickness. Aspect ratios of one to several 
thousands can be achieved [86], and MZP optics has been implemented for hard 
X-ray energies in the broad range from 8 keV up to above 100 keV [87]. Figure 3.25a 
shows a sketch of MZP fabrication by PLD. An intense laser pulse is focused onto 
the target material (not shown), which then evaporates. A plasma plume forms, from 
which gas atoms are deposited on the substrate. Smoothing is favored by highly 
energetic particles with kinetic energies of up to 100 eV, resulting in high mobility 
and enhanced diffusion on the substrate surface. Advanced focused ion beam (FIB) 
cutting and manipulation protocols yield well positioned and mounted MZPs [86, 
88]. For the MZP shown in Fig.3.25b, a computer controlled KrF excimer laser 
(wavelength of 248 nm) was used with pulse duration of 30 ns and repetition rate of 
10 Hz. The laser beam was focused onto the different targets in ultrahigh vacuum 
of about 1078 mbar. The targets were moved constantly following an algorithm that 
allows uniform ablation from different directions. The films were grown at room 
temperature at a target-to-substrate distance of 6.5 cm [88]. The latest generation of 
lenses are fabricated from Ta2Os5 and ZrO>. For more information, see the progress 
report in the second part of this book. 


3.5.3 Diffractive Optics Beyond the Projection 
Approximation 


Above, we have described the working principle of optically thin zone plates. In the 
general case of a partially absorbing and phase-shifting zone plate, it is modelled 
as a complex-valued phase mask 7 in two dimensions; the impinging wave-front 
w is modulated by this phase mask. Numerically, this is calculated as a pixel-wise 
multiplication of two matrices: 


yy y— Ty, Wij = Ti j . %ij- (3.108) 
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Fig. 3.25 a Schematic of thin film deposition on a rotating wire, e.g. by pulsed laser deposition. 
The zone plate is subsequently diced by a focused ion beam to the desired optical length. Adapted 
from [75]. b Transmission electron micrograph of a MZP lens, consisting of alternating thin layers 
of ZrO> and Ta205 fabricated by PLD and FIB on a rotating pulled glass wire with diameter 
2ro = 1.2 um. The diameter is D = 3.2 um, the outer-most zone width is drgı = 10.0nm, the 
focal length for the photon energy E = 18keV was f = 470 um [27] 


In the soft X-ray regime, where the optical thickness of FZPs is usually on the 
order of a few hundred nanometres, this model can usually be justified. We define 
the zone plate Fresnel number Fzp as 


(Ary) 
Fyp := 
sd At 


(3.109) 


with outermost zone width Ary = ry — ry-1, wavelength A, and thickness t. For 
Ary > 30nm, A © 3nmandr < 300 nm as an example of a soft X-ray FZP, Fzp > 1; 
hence the treatment of a thin zone plate based on the projection approximation is 
completely adequate. For hard X-rays, however, we easily achieve Ary = IS nm, 
A = 0.1 nm andt = 5 pm, resulting in Fzp = 0.05. This gives a clear indication that 
diffraction effects within the FZP itself have to be accounted for. More specifically, 
the kinematic or Born approximation of single diffraction at the phase mask 7 has to 
be replaced by dynamical diffraction theory. For such optically thick optics, volume 
effects have to be taken into account. 

A Takagi-Taupin based theory for MLL optics has been derived by Yan et al. 
[89], extending previous dynamical treatments denoted as coupled wave theory [2, 
90, 91]. Here we briefly summarise their model and findings of [89]. The derivation 
is similar to that presented above for multilayer mirrors (MLMs) and starts with 
the Helmholtz equation of a scalar or vector field amplitude that interacts with the 
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Fig. 3.26 a Schematic of diffraction orders and multi-order focusing. Positive orders are shown in 
red, negative in green. The orders result from the binary nature of the zone plate. For FZP imaging, 
order sorting apertures can be used to select only the m = +1 diffraction order. Alternatively, 
algorithmic schemes to dissect different orders can be used, which is, however, still challenging 
[27]. The reconstruction of the focal field can be achieved by iterative projection algorithms [4, 27]. 
b Simulation of the focused intensity, on a logarithmic false-color representation along the optical 
axis; compare to (a). From [27] 


pseudo-periodic susceptibility x(r). For MLMs, the Fourier series can be truncated 
after one term, and only two beams (incoming and reflected) are considered. MZPs, 
on the other hand, show multiple diffractive orders, and hence multiple beams and 
more Fourier orders need to be taken into account. A further complication arises 
since x(r) is not a simple periodic function, but changes according to the zone plate 
law. Nevertheless, Yan and co-workers argue that the zone plate can be considered as 
a “strained crystal” with a varying d-spacing of d = 2Ar,. They use the coordinate 
transformation (Fig. 3.26) 


ro i (VAF P- f), 6.110) 


where T is the period of the new, unstrained and fully periodic lattice. The transfor- 
mation yields a phase-factor 


exp (i n) := exp (irn? Ff- N) (3.111) 


where h is the index of the diffractive order under consideration. Decomposing the 
field E into components E,, and using the a truncated series expansion for x, yields a 
set of coupled partial differential equations describing the system. Within the Takagi- 
Taupin formalism, the gradient of the phase &,, is equal to the local reciprocal lattice 


vector: 
2th ~ 
Ph := a = Voy. (3.112) 
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This shows that apart from the geometrical considerations for constructive inter- 
ference discussed above, zone plates can also be considered as crystal optics where 
the local d-spacing is chosen such that all diffracted beams of a certain order h point to 
the same focal spot. When volume diffraction occurs, the diffracted X-ray beams are 
disturbed significantly within the structure. Geometrically speaking, a ray diffracted 
at the entrance of the zone plate at a specific zone would enter another zone while 
traversing the zone plate. Then, multiple diffraction will not follow Bragg’s law. To 
match the diffraction angles, the originally parallel zones have to be varied along the 
optical axis. To this end, Yan et al. discuss tilted, wedged, and curved zones. Based 
on their computations, a focusing efficiency of 67% at sub-1 nm spot sizes at a pho- 
ton energy of 19.5 keV is possible using MLLs fabricated from Si and WSiy [89]. 


3.6 Basic Coherence Theory and Simulations for X-ray 
Optics 


Coherence of light beams refers to their ability to exhibit interference effects. Already 
in the first interference experiment of light, the famous double slit-experiment of 
1801, Thomas Young discussed the “visibility of fringes”, which today is referred 
to as the degree of coherence (rj, fı, r2, t2) between two time-space-points (r4, t1) 
and (r2, t2). Whenever light waves emerging from these two points superimpose at 
a third point, the total intensity /,,2 in general differs from the sum of the single 
intensities, /ı 2 A Iı + I. This is immediately clear since Maxwells’ equations are 
linear in the amplitudes u, but not the intensity |u|? 


u12 = u; + U2 


2 2 2 2 2 
D2 = |u; + uel = Juı |" + ual” + 2yı,2luıllu2| A [ui + lu2|”. 


In the following, we first give the basic definitions of coherence functions from 
literature; afterwards, we will shortly discuss an analytical treatment for synchrotron 
radiation. Using a stochastic model suitable for the numerical treatment of partially 
coherent propagation of light through various optical elements and samples, we will 
show that X-ray waveguides can indeed be used as coherence filtering devices. 


3.6.1 Basic Definitions 


Consider a scalar wave-field with complex amplitude u(r, t). Then the intensity 
I (r, t) at the space-time-point (r, t) can be defined as 


I (r,t) := u” (r, t)u(r, t), (3.113) 


and the mutual intensity I’ (r1, t1, Ta, t2) between the given two space-time-points as 
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T (ri, ti, r2, t2) := u” (r1, t1) u(r, b). (3.114) 


Note that higher-order correlations involving more than two space-time points 
can be defined, but are rather uncommon in practice. The temporal Fourier transform 
Wı2 = f dr T exp(iwr) of the mutual intensity is known as the cross-spectral 
density. Note that the indices (1, 2) are often used for notational simplification to 
denote the two space-time-points. The normalized mutual intensity 


ry, ti, r2, t) 


V I, tı) Im, h) 


is a measure of the cross-correlation of the fields at two different points in space and 
time. For stationary signals, y depends only on the time difference T = h — tı and is 
also denoted as the complex degree of coherence. Further, for quasi-monochromatic 
waves, it is sufficient to consider the mutual intensity at the same-time 7; = fz, since 
the time-harmonic variation of the fields is trivial. This same-time mutual intensity 
depends only on the spatial coordinates of two points 


V1, t1, r2, h2) = (3.115) 


JŒ, r2) = (u* (rı) ura))r, (3.116) 


where (...)r denotes the time-average over at least a period T, or for practical 
purposes the illumination time of the experiment. The mutual intensity J contains all 
information about measurable intensities. Finally, the normalized same-time mutual 
intensity is defined as 

J (r1, r2) 


i (r], = : 3.117 
u ~Ji, r1) J(r2,r2) i : 


We use the normalized quantity j, if we are interested in the visibility of inter- 
ference fringes, for example, not the absolute intensity values. If one considers a 
Young’s double slit experiment with quasi-monochromatic light and two slits at 
points rı and r>, the emitted spherical waves yield an interference pattern, with the 
fringe visibility (i.e. the Michelson contrast of the fringes) given by |j|. For |j| < 1 
we call the light field partially coherent, whereas |j| = 1 and |j| = O denote fully 
coherent and incoherent light, respectively. These limiting cases can in fact never be 
completely realized. 

In many practical problems, we are furthermore primarily interested in evaluating 
j ina plane orthogonal to the optical axis, e.g. to study coherence in the plane of an 
optic, sample or detector. The coherence properties in any one of such planes, how- 
ever, evolve as the beam propagates. For matter of concreteness, let z be the optical 
axis, and let y denote the lateral direction of interest. For simplicity, we drop the 
dependence on x, and consider z as a parameter. In view of interference, we are often 
interested in the field correlation between point yı and another point y2 at a lateral 
distance y2 = yı + d. For linear optical systems which are characterized by lateral 
shift invariance, the degree of coherence is homogeneous in planes perpendicular to 
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the optical axis, and hence 
ied) := j(Q. 2), 0 +4, 2). (3.118) 


For experiments it is then sufficient, for example, to fix one point to the optical 
axis and to measure j (d) := i((, z), (d, z)). What can we say about the functional 
dependence of j(d)? We expect the correlation to decrease with distance d (must 
not always be the case!), and would like to associate a characteristic length & to 
the decay of j(d). For the simplest case of an incoherent field in plane z = 0 and 
paraxial propagation, it can be shown that the degree of coherence is given by a 
Fourier transform of the intensity in the source plane (van Cittert-Zernike theorem) 


> 2 
jd = J dy' I(y') exp (Zya), (3.119) 
ae zZ 


i.e. we can easily predict how the coherence evolves by propagation. By ways of this 
Fourier relationship, we see that the spatial coherence length scales as 


OAZ 
25’ 


éL (3.120) 


where s is the source size. Correspondingly, €/z defines a “coherence angle”. Note 
that a precise definition of € would require us to be more precise about the cut- 
off value, to which j would be allowed to decrease, as well as more information 
or assumption on the source distribution /(y’). However, this may all be incorpo- 
rated into a prefactor. As an example, let us consider the 3rd generation synchrotron 
source PETRA III with a horizontal source size of o = 36nm (1o) in the low 8 
sections. For A = 0.1 nm the coherence angle is coh ~ 2 prad. Typical optics, how- 
ever, accept a beam angle of about 5 urad, and the full beam has an opening angle 
of 100 prad. One is thus facing the situation of rather reduced partial coherence, and 
small entrance slits or pinholes are required for coherent imaging or photon correla- 
tion spectroscopy experiments. For more information and the useful Gaussian shell 
model (GSM) to describe coherence properties of synchrotron beams, we refer to 
[92, 93]. Note that synchrotron sources are actually not fully incoherent; the radiation 
by the ultra-relativistic electron beam with y = (1 — (v/ 2)? ~ 104 is confined 
to a small cone with opening V ~ 10~* rad. This already yields a considerable cor- 
relation “already in the source plane”, which can be incorporated in the GSM by an 
additional parameter. Nevertheless, at the position of experiments, € is dominated by 
propagation. While €, is the length scale characteristic for the correlations across a 
wavefront, coherence along the direction parallel to the optical axis is characterized 
by the longitudinal or temporal coherence length 


(3.121) 
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where A. is the spectral width. This relationship follows from the Wiener-Khinchin 
theorem. 


3.6.2 Stochastic Model 


How can we model the emission of “chaotic sources”? Such sources do not emit plane 
waves, but rather wave-trains of finite duration, and hence of finite bandwidth. In 
addition, each emitter acts on its own, independent of its neighbours. Say the temporal 
duration of a wave-train is Train. But during the acquisition time Tacq >> Tirain, a detec- 
tor registers light from many subsequent wave-trains, each of which has been emitted 
with arandom phase. While a single wave-train may give rise to a fully coherent inter- 
ference pattern, the patterns of different wave-trains are out of phase and hence shifted 
spatially. Over sufficiently long times, the patterns wash out, the interference fringes 
vanish. Usually, for monochromatized synchrotron radiation the relative bandwidth 
is on the order of 1074 (common Si-111 monochromator). At X-ray frequencies of 
w ~ 10'85=!, this yields correlation times of Te ~ 10=1*s or shorter. This is of course 
well below the response time of typical detectors and shorter than the pulse duration. 
At X-ray free electron laser sources, however, extremely short pulses can be pro- 
duced, and we can expect full temporal coherence. The finite temporal or longitudi- 
nal coherence time directly translates to the corresponding length €, = cr. ~ 1 um, 
and hence also the largest possible path length difference between two interfering 
waves is still much larger than molecular length scales. 

Based on this simple picture of finite wave trains with stochastic phases vary- 
ing in time and space, we can introduce a simple stochastic model to treat partial 
coherence numerically. We replace the continuous extended source of size D by a 
set of N independent point-sources. The field by each point-source can be propa- 
gated numerically through an optical system, e.g. by solving the Fresnel-Kirchhoff 
integral for focussing mirrors, or the paraxial wave-equation for waveguides. For 
each point-source, a complex-valued field in some region of interest is thus obtained, 
denoted by u„(x, y) for the nth point-source and given in a two-dimensional area 
(the generalisation to three dimensions is obvious). We define a single stochastic 
realisation as the random superposition 


Uspeckle := > Wn Cn Un (3.122) 


n 


with random phase factors c, = exp (ipn), Yn € (0, 27], and with weighting factors 
Wn corresponding to the intensity envelope of the source. A single realisation cor- 
responds to the interference pattern of a short wave-train with coherence time Te; 
the pattern of a long exposure time T > Te is modelled for the time-average (...)- 
over superpositions with random phase coefficients c„. Numerically, a few thousand 
realisations should be taken into account. From such an ensemble, we calculate the 
degree of coherence j as 
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(3.123) 


3.6.3 Coherence Propagation and Filtering 


Using the stochastic model, the partially coherent X-ray intensity and the degree 
of coherence can be propagated through an optical system. Here we briefly illus- 
trate this with an example of mode propagation in X-ray waveguides, which can be 
used as coherence filters [66]. We also address nano-focusing of partially coherent 
radiation, using the example of the KB focus of the GINIX endstation at the P10 
beamline of the PETRAIII storage ring at DESY [45, 51, 94]. Figure 3.27a shows 
the simulated intensity distribution Z (x, y) = J((x, y), (x, y)) in the focal region 
(colour-coded) for X-ray photon energy E = 7.9 keV, and the geometric parameters 
of the horizontally focusing mirror (HFM). The green curves show iso-lines of the 
degree of coherence |j(d, z)| = |j (x = 0, z, x = d, z)| along the optical axis. As 
can be seen, only the central part of the intensity distribution is coherent. In fact, 
the spatial coherence drops to 0.5 for separations largerthan the predicted coherence 
length €, which in the focal plane is 74nm, while the FWHM beam size is 220 nm. 
Using Talbot interferometry, a beam size of 203 nm (FWHM of a Gaussian fit) and a 
coherence length of 0.37 x 203 nm = 75 nm have been measured [94]. To filter the 
coherent part of the illumination, an X-ray waveguide can be placed into the focal 
spot. In the simulation, a D = 50 nm guiding layer (vacuum) in Si is illuminated 
by independently propagated point-sources, and the complex valued amplitudes are 
propagated using the paraxial methods described above. On this set of basic fields 
Un, a stochastic ensemble is performed. The result is shown in Fig. 3.27b. Again, the 
colour-code shows the partially coherent intensity Z (x, z) = J (x, z, x, z); as can be 
seen, only the central part of the focused beam is coupled into the guiding layer. As 
the iso-lines of coherence along the optical axis show, the beam is now fully coherent. 
This effect has also been demonstrated experimentally [94]. 

Using X-ray waveguides, it is also possible to directly measure the intensity dis- 
tribution, by scanning them through the focal plane. This is of interest for charac- 
terisation of X-ray nano-focus instruments. Figure 3.28 shows results obtained in a 
characterisation of the KB optics at the NanoMAX beamline at MAX IV synchrotron 
in Lund. With a high dynamical range of larger than 1 : 104, it is even possible to 
resolve interference patterns due to slit scattering of the KB entrance slit. From the 
visibility of these fringes, coherence properties can be quantified. This finite, or par- 
tial, coherence reduces the visibility of interference fringes. The resulting intensity 
pattern in the focal plane has been calculated based on an analytical model. Together 
with the measured profiles, the degree of coherence was measured as a function of 
(secondary) source size. 
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Fig. 3.27 Focus and coherence properties of KB focused (left) and WG filtered (right) X-ray beams: 
a shows the color-coded intensity distribution in the focal spot of the KB mirror, with isolines of 
the degree of coherence (j = {0.4, 0.6, 0.8}); in b, the focused field is coupled into aWG. Plots c-f 
are cuts of intensity (red) and coherence (green) at the indicated positions (focal spot, after 1 um; 
at the center of the WG, after 0.1 um). From [66] 


3.7 Putting It All Together: Optics and X-ray 
Instrumentation 


As a last note, we want to stress the importance of integrating X-ray optics into 
an instrument, be it an X-ray microscope, diffractometer or spectrometer. Notwith- 
standing the importance of its individual optical components, such as mirrors, waveg- 
uides, CRLs or Fresnel zone plates, the larger challenge is to put it all together into 
a fully working synchrotron instrument and beamline. The tasks are many: optical 
design and simulation, instrument control, radiation safety, precision, as well as data 
acquisition and management. Unlike other analytical techniques, instrumentation 
development is still largely in the hands of research groups, rather than commercial 
providers. The diversity of SR instruments is impressive. For a comprehensive view, 
we refer to the online documentation provided by almost all SR facilities, the beam- 
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Fig. 3.28 Characterisation of focus and lateral coherence of a hard X-ray nanoprobe (NANOMAX 
beamline, MAXIV synchrotron). The focal intensity distribution is measured by scanning a waveg- 
uide through the focus. The coherence properties are quantified by analysis of the fringe visibili 
ty of the KB tails. a Two measured focus profiles of the KB mirror for partially coherent setting 
(secondary source size 20 um, blue dashed curve) and quasi-coherent setting (secondary source 
size 5 um, red line), shown on linear (left) and logarithmic (right) scale. b Intensity profiles of the 
three guided waveguide modes (left), and superposition of the three modes (right). c Simulated and 
experimental focus tails, for different coherence settings. The blue line shows the simulated inten- 
sity in a fully coherent setting; the red dashed line is convolved with the waveguide mode structure; 
the green pointed line is calculated for a secondary source size of 15 um (and convolved with the 
waveguide guiding channel). The black circles represent experimental data for a secondary source 
size of 15 um, with a sinusoidal fit shown as a black line. Experimental data has been shifted by a 
factor of 1 : 3.5 vertically for clarity. d Fringe contrast, quantifying the degree of coherence, as a 
function of the secondary source size. The blue thin line is simulated for ideal sampling, while the 
red thick line accounts for the convolution with the WG modes. The orange circles show contrast 
fits for experimental data; orange lines correspond to 20 error bars of the fit. From [95] 


line articles of IUCr’s Journal of Synchrotron Radiation, and the proceedings of the 
recent international conference on SR instrumentation. Instead of even trying to give 
an overview, we present an example: the Göttingen Instrument for Nano-Imaging 
with X-rays (GINIX) at the P10 coherence beamline of the PETRA III storage ring 
at DESY in Hamburg [45]. The instrument has been designed based for nanoscale 
focusing using compound optical systems. This can be either the combination of 
a focusing and a filtering step [30, 31, 67], or by two sequential focusing steps 
[4, 27, 81]. 

Figure 3.29 illustrates (a) the optical path of the P10 beamline, with its com- 
ponents and respective distances from the undulator source, as well as (b, c) the 
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nano-focus optics and sample stage of the instrument. GINIX comprises a modular 
compound nano-focus optical system, composed of a high gain fixed curvature (KB) 
mirror and an X-ray waveguide module, which is used for holographic imaging and 
tomography [61]. For scanning SAXS recordings [98], a soft-edge aperture is used 
to clean the KB tails. Ultimate sub-10nm focusing is possible with a high resolution 
scanning stage, equipped with MZP optics [27]. Three major imaging modalities are 
supported by the instrument: (i) near-field phase contrast imaging, also denoted as 
in-line holographic imaging, (ii) far-field coherent diffractive imaging (CDI) with 
ptychographic phase retrieval, and (iii) scanning nano-diffraction, in the small angle 
or wide-angle regime (scanning SAXS/WAXS). The KB mirrors are positioned at 
~85 m behind the undulator source, and can be operated in the photon energy range 
between 6 and 14keV [45]. The two orthogonal mirrors with Rh coating are pol- 
ished to fixed elliptical curvature are each 100mm long, and accept a maximum 
beam size of ~0.4mm. The first mirror focuses in vertical direction (VFM) with 
focal length f = 302mm and incidence angle 0 = 3.954 mrad (mirror center), the 
second mirror focuses in horizontal direction (HFM) with focal length f = 200mm 
and incidence angle 0 = 4.05 mrad (mirror center). Optical metrology has confirmed 
a surface quality with height deviation of <0.5 nm peak-to-valley. To reduce beam- 
induced degradation, the KB system is operated in a ultra high vacuum vessel. The 
beam size impinging onto the mirrors can be controlled by slits. When the slits are 
opened, the mirrors are operated under conditions of partial coherence, since the 
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Fig. 3.29 The GINIX endstation, installed in the second experimental hutch (EH2) of the 
PETRAIII/P10 beamline at DESY. a Schematic of the beamline’s optical path, with (A) undulator, 
(B) primary slits, (C) secondary slits, (D) double crystal monochromator, (E) horizontal mirrors, 
(F) girder system with slits, (G) slits, (H) fast shutter, (I) monitor, (J) attenuators, (K) experimental 
setup in EHI, (L) slits, (M) monitor, (N) experimental setups in EH2 (GINIX or diffractometer), 
(O) rear detector bench. From [96]. b CAD drawing, showing (from right to left) the KB mirror 
tank (light grey), the hexpod with the waveguide optics (dark grey), the mounting for the cleaning 
apertures (dark grey top), online optical microscopes for alignment and inspection (yellow, brown), 
and the tomography sample stage (blue). c Photograph of the instrument 
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geometric acceptance exceeds the spatial coherence length. However, at the expense 
of flux density, one can select the coherent fraction by closing the slits in front of the 
KB [99]. The focus then becomes diffraction limited, i.e. fully coherent, which is 
important for coherent diffractive imaging (CDI) and ptychography. Depending on 
orbit parameters, slit settings and alignment status, focal spot sizes down to about 
x200 nm x 200nm (FWHM, as measured by waveguide scans) can be achieved 
with a flux larger than 10"! ph/s [94]. Coherent illumination of the mirrors by clos- 
ing the slit in front of the KB results in focal spot sizes focus sizes in the range of 


a 


phase [rad] 


a 


Fig. 3.30 Ptychographic reconstruction of the KB probe at 13.8keV with KB entrance slits 
100 um x 100 um, reconstructed from data recorded by the Lambda detector and without pinhole 
(dataset 3 in [97]). a Amplitude and phase are drawn according to the colorbar next to the image. 
Vertical and horizontal line-cuts through the intensity in the focal plane yield FWHM = 217nm and 
FWHM = 136nm, respectively. b, c Intensity distribution as calculated from numerical propaga- 
tion of the reconstructed field, in the b xy and c the xz planes. d Normalized sharpness as obtained 
from an area integral of the squared intensity, along with a Lorentzian fit, yielding a depth of focus 
(DOF) of 1.53 mm (FWHM). Scale bar in a 500nm. From [45] 
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150-500 nm [97, 99-101]. A ptychographic probe reconstruction is shown in Fig. 
3.30. Different ptychographic and multi-plane probe reconstruction methods have 
been compared in [102]. As one can see, quite a bit of effort is required to realize 
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Chapter 4 ®) 
Statistical Foundations of Nanoscale gun: 
Photonic Imaging 


Axel Munk, Thomas Staudt and Frank Werner 


Essentially, all models are wrong, but some are useful. 
— George Box 


4.1 Introduction 


4.1.1 Background and Examples 


The term ‘photonic imaging’ describes an optical imaging setup where the available 
measurement data Y are counts of detected photons. The origin of these photons 
can be diverse in its nature. In coherent X-ray imaging (see e.g. Chap. 2), photons 
emitted by an X-ray source (like a free electron laser) are scattered (and/or absorbed) 
by a specimen. In fluorescence microscopy (see e.g. Chap. 1 or Chap. 7), marker 
molecules are excited by an excitation pulse and emit photons with a certain proba- 
bility. These two examples are characteristic for the wide range of scenarios arising 
in photonic imaging: in coherent X-ray imaging we have on the one hand single- 
molecule diffraction data composed of only few photons [1], and on the other hand 
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holographic experiments where millions of photons can be collected from one sam- 
ple [2]. In fluorescence microscopy, the number of photons is intrinsically limited 
to a few hundred or thousand per marker due to bleaching effects, and in case of 
temporally resolved measurements, only a handful of photons is available per time 
step [3]. Similar restrictions arise in related imaging modalities, including those 
based on Förster resonance energy transfer (FRET) or metal induced energy transfer 
(MIET), see e.g. Chap. 8 or [4, 5] for a discussion. Although not within the context 
of nanoscale imaging, statistically related is astrophysical imaging. Here, there is no 
a priori limit for the observation time and hence for the number of photons. How- 
ever, the former is practically limited to several minutes to avoid severe motion blur, 
see e.g. [6, 7] for examples. We also mention positron emission tomography (PET), 
where the total number of emitted photons should be as small as possible to minimize 
the radiation dose for the patient [8]. In all of these applications, detected photons 
can also originate from undesired background contributions, whose nature strongly 
depends on the experimental setup, adding additional noise to the observations. 


4.1.2 Purpose of the Chapter 


The aim of this chapter is to give an overview over prototypical approaches to model 
the data emerging in photonic imaging from a statistical point of view, based on the 
physical modeling of photon observation. A sketch of the typical imaging setup we 
consider is presented in Fig. 4.1. 

We assume that the imaging process is described by an underlying photon 
intensity A: 2 x [0, T] — [0, co) at the detector interface, where 2 is the spa- 


detection plane 2 


photon source optical system Ww = 
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Fig. 4.1 Sketch of the imaging process. A source emits photons that are mapped on a (binned) 
detection interface 2 through the optical system. The underlying photon intensity A(-, t) at time t, 
which is determined by the physics of the specific imaging setup, is used for statistical modeling of 
the detected signal 
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tial domain of observation (which can be two- or three-dimensional) and T is 
the total observation time. Let us enumerate the emitted photons by 1,..., N and 
denote their specific detection position and time by (x;, f;) € 2 x [0, T]. Fora given 
(measurable) subset A C £2 and time interval 7 C [0, T] we write Y (A x J) := 
# {1 <i<WN | x, EA, tiel } to denote the number of photons observed in A dur- 
ing Z. The expected number of photons detected in A x J is by definition of À given 
by 


Yax i= ff dq. ayar, (4.1) 


I A 


Note that this includes all detected photons, including all background contributions. 
We will always assume A > 0, which ensures that the integral in (4.1) is well-defined 
(however it might be oo). 

Throughout this manuscript, we will discuss statistical models for the distribution 
of the observations Y, depending on the physical measurement setup. We assume 
AÀ to be given, as deriving or estimating À and/or other model parameters described 
(implicitly) by A is the topic of other expositions (see e.g. Chap. 5 or Chap. 11). 


4.1.3 Measurement Devices 


Depending on the type of sensor used for photon detection, different models for 
photonic imaging settings have been proposed. One commonality of all measurement 
setups is that the spatial domain of observation £2 is discretized into detector regions, 
so-called bins. We will assume that the detectors on all bins have identical physical 
properties, and we denote the centers of such bins by x € £ with & being the set of all 
bin centers. If a charge-coupled device (CCD) camera is used for detection, all bins 
(the pixels of the sensor) can be observed simultaneously. This is e.g. the case in most 
coherent X-ray experiments or astrophysical imaging. PET requires a tomographic 
setup consisting of several photomultiplier tubes (PMT) surrounding the patient (see 
e.g. [9]). In confocal fluorescence microscopy the most widely applied detectors are 
based on avalanche photodiodes, which can measure photons in one bin at a time 
only. Hence, the domain of observation £2 is typically scanned by physically moving 
the specimen (or detector) at a fast pace. Temporal simultaneous photons can be 
measured as well, requiring a different experimental setup (see e.g. [10]). 

Most photon detectors rely on the photoelectric effect. With a certain probability 
(the quantum efficiency), incident photons will release photo electrons on the detector 
surface. Since single electrons cannot be detected reliably, the signal is typically 
amplified by a cascade of electron multiplying systems. This introduces additional 
noise due to the stochastic nature of the multiplying steps. Another complication is 
the existence of dead times. The dead time of a detection device refers to the time 
interval (after activation) during which it is unable to record another event. Dead 
times can, for example, arise due to the necessity to recharge conductors in-between 
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measurements, or due to time delays caused by analog-to-digital conversion and data 
storage. Details on the statistics of different detectors can be found in [11, Cpt. 12]. 


4.1.4 Structure and Notation 


For the remainder of this chapter we will develop and discuss models for the right 
part in Fig. 4.1 with different degrees of accuracy. The model choice mainly depends 
on the total number of detected photons and on the spatial and temporal dependency 
structure of the randomly generated photons. We will start with the Poisson model, 
which is well-known and most common for many applications. It can be derived 
immediately from (4.1) under the assumption of independence, which explains its 
wide use in photonic imaging (see e.g. the reviews [7, 12] and the references therein). 
However, if it is necessary to count photons on small time scales, or if independence 
is not given, a more refined modeling is on demand. In these situations, we turn 
towards Bernoulli and Binomial models subsequently, and discuss to what extend 
they are compatible with the aforementioned Poisson model. Finally we turn to the 
case of large counting rates, which lead to Gaussian models based on asymptotic 
normality. We discuss differences and commonalities arising from the different base 
models and indicate in which situation which model should be used. This will be 
linked to different examples from this book, where we argue if our assumptions are 
met or not. 

Let us introduce the basic notation used in this chapter. We will always assume 
that any observation y is the realization of a random variable Y , and we will denote by 
P probabilities w.r.t. this random object. By E and V we will denote the expectation 
and variance w.r.t. P, respectively. The letters P, B and N will denote the Poisson, 
Binomial and normal distribution introduced below. Random variables will always 
be denoted by capital letters X, X;, Z etc., and if we write i.i.d. for a sequence 
X 1, X2,... of random variables, this stands for independent identically distributed. 


4.2 Poisson Modeling 


Suppose we have a perfect photon detector that registers the individual arrival times 
of all emitted photons reaching a bin without missing any. We will focus on describing 
a single bin for the moment to avoid notational difficulties. In this situation, the total 
number of collected photons often can be modeled as Poissonian. A random variable 
X follows a Poisson law with parameter (intensity) u > 0, if 
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a — , 
P[X = j] = qe es JENo. 


We write X ~ P (u). The following fundamental theorem about point processes 
explains why the Poisson distribution often comes into play when modeling photon 
counts: 


Theorem 4.1 Suppose we observe a random number N of photons at random arrival 
times 0 < ti < +++ < ty <T such that 


(a) for each choice of disjoint intervals Iı,..., I„ C [0, T], the random variables 
# {1 <k<WN | th € I}, 1 <i <n, corresponding to the number of observed 
photons during I; are independent, and 

(b) there exists some integrable function u on (0, T] such that for any choice 0 < 
a <b < T it holds 


b 
[fl sks N|asns<b]]= | wo) ar 


a 


Then, for all 0 <a < b < T, the number of photons observed between time a 
and time b is Poisson distributed with parameter f? u (t) dt, i.e. 


b 
#{i<k=Nlasnso}~P(f wy ar), 


For the proof we refer to [13, Theorem 1.11.8]. In terms of probability theory, 
this theorem implies that the point process X := EG ô, , with 6, denoting the Dirac 
measure at t, is a Poisson point process with intensity u if the stated assumptions are 
satisfied. 

Let us discuss these assumptions. Condition (b) underlies our whole modeling 
procedure as described in (4.1) and seems universally evident. Temporal indepen- 
dence of the arrival times in (a) is more critical but seems (at least approximately) 
reasonable in many imaging modalities where photons arise from a high-intensity 
source, including coherent X-ray imaging. However, if the photons arise from fluo- 
rescent markers, temporal independence can be violated due to hidden internal states 
of the fluorophores, energy transfer between different fluorophores on small time 
and spatial scales (e.g. FRET), or dead times of the detectors. 

If temporal independence is given, then Theorem 4.1 states that the number Y, ; 
of collected photons within a bin Bx until time r € [0, T] can naturally be modeled 
by a Poissonian random variable with intensity h f By A (y, T) dy dr. This gives rise 
to the following model: 
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Poisson model 


Let the spatial domain of observation 2 be discretized into bins Bx with centers 
x € &. We assume that our observations are given by a field Y, := (Yx) 
of random variables such that 


xed 


if 


YırP [fre T) dydr], xe 4&,te[0,T] (4.2) 


0 By 


for some intensity function > 0. 


This is the basis of many popular models covering a variety of distinct applications. 
Examples include PET (see Vardi et al. [9]), astronomy and fluorescence microscopy 
(see Bertero et al. [7] or Hohage and Werner [12]), or a more subtle model for CCD 
cameras due to Snyder et al. [14, 15]. 

Note that so far we have assumed that all arriving photons are collected by the 
detector. This will however be never the case due to several physical limitations, see 
Fig. 4.2. 

The specific efficiency depends strongly on the setup and can vary considerably. 
Additionally to different quantum efficiencies of different detectors, it might also 
happen that the detector does not cover all of 2 or has some dead subregions (like 
interfaces between individual elements). This causes a loss of measured photons and 
hence a statistical thinning of the random variable Yx. In this case, the actually 


detector detection 
interface plane 2 


unregistered photon 


bin gil-------c 


registered photon 


Ln ® 
photo 
electron 


— dead time At —4 [| 


Fig. 4.2 Statistical photon thinning at the detector interface. Photons that reach the detection plane 
can stay undetected due to various reasons. For example, they can fail to free a photo electron, miss 
the sensitive regions of the detector bins, or arrive during the dead time caused by a previously 
recorded photon 
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observed random variable Y, , can be written as 


Yer = Xi (4.3) 


with Bernoulli random variables X; having success probability 7; € [0, 1] where 
each X; indicates if the ith photon has been detected. If, in addition, the thinning 


happens identically and independently for each photon, i.e. X; = B (1, 7), only the 
parameter in the Poisson law (4.2) changes, but not its distributional structure. More 
precisely, in this case it follows (see the Appendix) that 


£ 
Fre ~P nf fron dy dr 


0 By 


Consequently, the imperfectness of a detector (as long as the induced thinning hap- 
pens independent for each photon) can be seen as a scaling of the underlying photon 
intensity A by an efficiency factor 7 € (0, 1]. In agreement with Fig. 4.1 we can hence 
assume that all physical processes causing a thinning have already been treated when 
modeling A in the following. 

Besides this kind of independent thinning, a further important issue in many 
imaging modalities is the dead time Ar of the employed detector. Dead times can 
vary significantly depending on the type of detector, but usually are in the range of 
nanoseconds. If a photon arrives at time t € [0, T), the detector will only be able to 
record the next photon arriving after t + Ar. Note that whenever Ar > 0, at most 
T/At photons can be detected during the whole measurement, which contradicts 
(4.2) in the sense that P [Y,r > T/At] = 0 in this case. Such an upper limit on the 
total number of detected photons can crucially change the distribution, which can, 
e.g., be seen from the following fact proven in the appendix: 


Theorem 4.2 Fix x € & and let Iı,..., I be a decomposition of [0, T] into 
disjoint intervals. Denote by X; the number of photons observed during I; in bin 
By. Assume model (4.2), and suppose that Xı,..., Xm are independent. Then the 


conditional distribution given Yy r = N of (X1, . . . , Xm) is multinomial with param- 
eter N and probability vector (pı,..., Pm) where 
ff A (y, T) dydr 
I; By 
Pi=7 : 
If A (y, T) dy dr 
0 Bx 


In other words, Theorem 4.2 states that, conditioning on the total number of pho- 
tons, the arrival times of individual photons behave like a Bernoulli process with 
intensity TH f g, À (Y, T) dy. This implies that conditioning on the total number 
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of photons introduces a dependency structure between the number of counts during 
different time intervals. Consequently, if At cannot be neglected, temporal indepen- 
dence is not given anymore, hence corrupting the Poisson law, and different modeling 
approaches are needed. 


4.3 Bernoulli Modeling 


To measure the temporal structure of the incoming photons, counting as described 
above is not sufficient. In such cases, photons are consecutively counted during 
(short) time frames. We suppose that the discretization of the temporal measurement 
process is refined such that temporal aggregation underlying the Poisson model is 
not appropriate anymore. This is described by (equidistant) time frames, which are 
consecutive intervals /ı, ,..., In C [0, T] of equal length 6 > 0, chosen such that 
the probability to observe more than one photon in each bin B, during any interval is 
sufficiently close to 0, and separated by a waiting time e > 0, which allows to ignore 
the dead time. In this situation, the following model is a reasonable approximation: 


Bernoulli model 


Forx € & and 1 <i < n the random variable Yy ; indicating if a photon arrives 
in bin Bx during the time interval J; follows a Bernoulli distribution, 


Yyi = Bil Px) ’ (4.4) 
with success probability 
Dai © [ fro. T) dydr. (4.5) 
I; By 


As mentioned before, the detector will hardly count all arriving photons, which 
causes a statistical thinning as in (4.3). If the thinning happens independently of 
the photon arrivals, we obtain Fyi ~B (1, n: Px.i) with the probability 7) that an 
incident photon is detected, which immediately follows from X - Z ~ B (1, pp’) if 
X ~ B(A, p) is independent of Z ~ B (1, p’). 

In many imaging setups, it would be difficult to store the whole time series Y, ;, 
for instance due to memory limitations. Examples include fluorescence microscopy 
setups like confocal, STED or 4Pi microscopy, or coherent X-ray imaging, where 
millions of photons are observed in short times, which would require an unreasonably 
fine time discretization. For other examples like SMS microscopy, however, the 
temporal structure can be important (e.g. for adjusting temporal drifts, see e.g. [16, 
17]) and hence most of the data of the above model has to be used. If temporal 
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dependencies are less important, it is sufficient to count photon arrivals in some 
interval J C [0, T] larger than 6, i.e. to consider Yx, := >> LCI Yx i. The distribution 
of Y, ; depends strongly on the temporal dependency structure of the Y, ;. In case that 
they are independent and px ; = px for all 1 < i < n, we obtain a Binomial model: 


Binomial model 


Forx € & and / C [0, T], the number of photons observed in the bin centered 
at x during the time interval J is 


Yx ~ B Œ{ C I}, px) (4.6) 


with px; = px forall 1 < i < n and px; as in (4.5). 


Note that if we proceed similarly with the thinned observations Pris we obtain 
Fr ~ B(# (I; C I},npx), which is the canonical thinning of (4.6), see e.g. [18]. 

Independence of the Yy ; is strongly connected to the photon source, as discussed 
above. Ife > Ar, the dead times of the detectors have no influence on the temporal 
dependency structure anymore. The second assumption, px; = px forall 1 <i <n, 
is equivalent to stationarity of the underlying photon source, which again depends 
on the imaging modality. If, e.g., a freeze-dried sample is imaged sufficiently fast, 
then this assumption is reasonable. 

Besides temporal dependencies, the field of random variables can also have a spa- 
tial dependency structure. In many modalities the random variables are independent 
for different pixels or voxels x, but on sufficiently small scales some dependency can 
occur, e.g., due to energy transfer between molecules. 


4.3.1 Law of Small Numbers 


It is a fundamental and well-known fact that a Binomial distribution can in certain 
situations be approximated by a Poissonian distribution. In this section, we will 
discuss how this provides a link between the initial Poisson modeling (4.2) and the 
preceding Bernoulli modeling (4.6). To this end, we recall the so-called law of small 
numbers, which will be stated in terms of Le Cam’s theorem [19]. For the moment we 
suppress dependencies on x and consider only a single Binomial random variable, 
corresponding to a fixed bin. 


Theorem 4.3 (Law of small numbers) Let X1, . . . , Xm be independent and Bernoulli 
distributed with success probabilities qı, ..., qm. Then the distribution of X := 
Xı+:::+X, can be approximated by P (Am) with Xm = — } `; log (1 — qi). 
More precisely it holds that 
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Fig. 4.3 Law of small numbers. Figure (a) shows the probabilities for a sum of 50 Bernoulli 
variables with q; = 0.1 and the respective Poisson approximation with A = —50log(0.9) ~ 5. 
Figure (b) depicts the left hand side of (4.7) and the corresponding upper bound (right hand side 
of (4.7)) of Theorem 4.3 for increasing m and q; := 5/m 


m 


<2) dog (1 — qi)’. (4.7) 


i=1 


k 


A 
P [X = k] = T exp (An) 


For a textbook proof we refer to [20, Theorem 5.1]. Figure 4.3 visualizes the law of 
small numbers. Note that the bound on the right-hand side of (4.7) can be simpli- 
fied by using log (1 — x) < —x, resulting in Y;_, a. We furthermore refer to [21, 
Proposition 4.3 and 4.4], where bounds of the supremum instead of the sum over k 
on the left-hand side are given. Note that Theorem 4.3 can be generalized to depen- 
dent Bernoulli random variables at the price of a worse upper bound, see e.g. [20, 
Theorem 5.5]. 

A classical example for this law is the situation when q; = qm forall 1 < i < m 
and qm +m converges to some A > 0, i.e., qm ~ 1/m. In this case we may use 
log (1 — x) ~ —x for small x to obtain Am 7 mqm > Nand2 >~", (log (1 — qi))? ~ 
2) 19° = mg, ~ 1/m > 0 as m > œ, i.e., the Binomial distribution of X 
converges rapidly to the Poisson distribution with parameter A. 

On the other hand, if the success probabilities q; = q € (0, 1) are fixed, the right- 
hand side of (4.7) diverges. This seems intuitive, as in this situation convergence 
towards a normal distribution has to be expected (cf. Sect.4.4.1 below). This is in 
line with the observation that a Poisson distribution with growing parameter Am = 
—m log (1 — q) converges towards a normal distribution (cf. Sect. 4.4.2 below). 

Let us now compare the two Poisson laws arising from Theorem 4.3 and (4.2). 
According to (4.4), our observations are Binomial random variables with success 


probability 
Px,i ~ f fag. T) dydr, 
li By 


where we used that the probability to observe more than one photon is close to 0. 
Hence, if we denote the largest time in Zm by tm, and use again log (1 — x) ~ —x, 
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then the number of photons observed until time tm is approximately Poisson dis- 
tributed with parameter Axim = Pxı +-+- + Pxm © i, f B, À Y, 7) dy dr ignoring 
the waiting times. This is in good agreement with (4.2). According to (4.7), the error 
in this approximation is bounded by 


l<i<m l<i<m 


2). (log (1 — pxi)) < 2Axm max pxi < C max |Il, 
i=1 


revealing it valid whenever the temporal discretization is sufficiently fine. 


4.4 Gaussian Modeling 


4.4.1 As Approximation of the Binomial Model 


Besides the approximation by a Poisson distribution, it is well-known that a Binomial 
model can also be approximated by a Gaussian one under suitable circumstances. 
Let us start with the Bernoulli model (4.4) and suppose that all Yx ; are independent 
with px ; = px. If we are interested in the total number of counts Yx := > 1 Yxi in 


bin x, the de Moivre-Laplace theorem states that 


Yx — npx 


/npx (1 — px) 


in distribution, where Z ~ N (0, 1) follows a standard normal distribution. Note 
Yx—npx sos : : 

that Jani 38 just the centered and standardized version of the total number of 

counts Yx. This implies that the distribution of Yx can be approximated by a Gaussian 

distribution with mean np, and variance npx (1 — px) ifn is sufficiently large. This 


gives rise to a first Gaussian model: 


>Z as n>o (4.8) 


Gaussian model I 


For each x € &, the number of photons observed in the bin centered at x up to 
time T is 
Yr N (npx, npx (1 — px)) (4.9) 


where n = n(T) ~ T/ö with the length ô of the individual time frames. 


The rate of convergence in (4.8) can be made more precise. For instance a special 
case of the Berry-Esseen theorem states 
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Yx — npx /1 2+ (= poy 
sup P| = =] 20) < ee al Gao) 
yer| Lvnpx (I= px) 6/27 np (1 — px) 


where © denotes the distribution function of N (0, 1), i.e., 


D( el ex (-5) dx 
Nee, 


In fact, the constant on the right-hand side of (4.10) cannot be improved [22]. An 
interpretation of this theorem is that the approximation leading to the model (4.9) is 
reasonable as soon as npx (1 — px) > 9, which implies the right-hand side of (4.10) 
to be bounded by te = 0.137. 

If the success probabilities px ; do vary in i, the de Moivre-Laplace theorem (4.8) 
cannot be applied immediately. However, it is still possible, under certain conditions, 
to derive an approximate Gaussian model of the form (4.9) by applying the Lindeberg 
central limit theorem (see e.g. [23]). It states that the sum Y,, after centralization and 
standardization, still converges to N (0, 1) in distribution even for non identically 
distributed Yx ;. This motivates a second Gaussian model: 


Gaussian model II 


For each x € &, the number of photons observed in the bin centered at x up to 


time T is 
A nw (DD a (1 -2u)). (4.11) 


i=l i=l 


Note that, if the random variables Yx ; are dependent, the type of dependency very 
much determines whether a central limit theorem is still valid (with different limiting 
variance), see e.g. [24] or [25-27] for mixing sequences, and [28] for martingale 
difference sequences, to mention two large classes of examples. 


4.4.2 As Approximation of the Poisson Model 


The Poisson model in (4.2) can also be approximated by a Gaussian one. This relies 
on the fact that the Poisson distribution is infinitely divisible, which means that 
whenever X ~ Poi (u), then X can be represented as X = X; + ---+ X„foranyn € 
N with i.i.d. random variables X1, ..., X, ~ Poi (u/n). Consequently, the central 
limit theorem states that 


4 Statistical Foundations of Nanoscale Photonic Imaging 137 


with Z ~ N (0, 1). The general Berry-Esseen theorem can also be used to bound the 
error of an approximation of X-H by Z, namely one obtains (see also [29]) 


Vit 
X- 5 
P| = oe (4.12) 
JE 2 Jh 
Hence, if u is sufficiently large, the distribution of X can be approximated by 
a Gaussian distribution with mean and variance ju. If we suppose that Yy; satisfies 


(4.2) and that i Sr A (y, T) dy dT > coast — ov, then the above reasoning gives 
rise to another Gaussian model: 


1 


sup 
yeR 


< J — (y) 


Gaussian model III 


For each x € £, the number of photons observed in the bin centered at x up to 
time t is 


t t 


Yı N J fro. T) ayar, | [or r)dydr|. (4.13) 


0 By 0 By 


4.4.3 Comparison 


Let us briefly compare the Gaussian models I-III in (4.9), (4.11) and (4.13) respec- 
tively. It is clear that (4.11) is a generalization of (4.9) to the case of non-identical 
success probabilities px,;, and both coincide if px; is independent of i. To com- 
pare (4.11) with (4.13), we recall our previous computation that px ı + +- + Pxn = 
i. f By A (y, T) dy dr — œ where t, is the largest time in the sub-interval /,. Con- 
sequently, (4.11) and (4.13) differ only in the variance by | — px,;, which is usually 
small. Hence, all three Gaussian models are in good agreement, and (4.13) can be 
considered the most simple one which should be used. 


4.4.4 Thinning 


Taking into account the detection efficiency 7 € [0, 1] as discussed before, we will 
arrive at models similar to (4.9), (4.11) and (4.13) with the only difference being that 
Px, Px,i Or A are multiplied by 7. In this sense, the canonical thinning of the Poisson 
or Binomial models carries over to the Gaussian one. 
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4.4.5 Variance Stabilization 


Note that the variance in the Gaussian models I-III is always inhomogeneous, which 
hinders data analysis with standard methods and causes further difficulties. This can 
be overcome by variance stabilization. The most popular choice is the celebrated 
Anscombe transform, which is applied to the Poisson model (4.2) to obtain asymp- 
totically a normal distribution with variance 1. It is based on the following result (see 
e.g. [30, Lemma 1]): 


Lemma 4.1 (Anscombe’s transform) Let > 0 and Y ~ P (u) be a Poisson dis- 
tributed random variable. Then it holds for all c > 0 that 


4,/u u? 


3-8 1 
vlvr#e]=1+ —~+0(5). 


[pF] an St +0/ z) 


From this we can conclude that the choice c = 3/8 ensures that the variance of 
24V Y + c does no longer depend on the parameter u up to second order. To reduce 
the bias, c = 1/4 is the best choice. Furthermore, applying this result to the Poisson 
model in (4.2) gives rise to a fourth Gaussian model: 


Gaussian model IV 


For each x € £, denote the number of photons observed in the bin centered at 
x up to time t by Y,,.. Then we assume 


1/2 


Dj Yer + 5 ~N 2 her ayer al (4.14) 


0 By 


for each x € &. 


We emphasize the importance of the model (4.14) in statistics, as it turns out to 
be equivalent in a strict sense to the previously discussed Poisson model (4.2) as the 
total number of photons (and hence the parameter r) tends to œœ (see e.g. [31-33]). 


4.5 Conclusion 


In this chapter we introduced models for photonic imaging setups with different 
degrees of accuracy. The most common and basic Poisson model (4.2) is accurate as 
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soon as the temporal dependency can be neglected and the detector has no significant 
dead time. If furthermore the number of observed photons is sufficiently large on 
each bin, then the Gaussian model (4.13) can be used. In case of significant temporal 
dependency, the Bernoulli model (4.4) with time resolved individual photon arrivals 
or the resulting Binomial model (4.6) should be considered instead. 

An overview about appropriate model choices for the various imaging techniques 
discussed previously is provided in Fig. 4.4. 

In fluorescence microscopy, STED based methods, which scan the sample pix- 
elwise, record about 10-100 photons per fluorescent marker. Due to low temporal 
dependencies, we are thus in the scope of the binomial or Poisson models [3]. Even 
though a Gaussian approximation seems questionable as in regions of low intensities 
only a few photons per bin can be collected, it has been successfully applied employ- 
ing variance stabilizing techniques [34]. In order to analyze STORM/PALM data, 
the full range of modeling approaches is applied. Individual frames contain spots 
with single or several photons and weak temporal dependency, calling for Bernoulli, 
binomial, or Poisson models, while Gaussian approximations are used successfully 
for drift and rotational corrections [17]. FRET/MIET based imaging heavily relies 
on the interactions of fluorescent markers, so that the assumption of temporal inde- 
pendence is violated. This makes the Bernoulli model the model of choice, or if more 
photons are counted, also the Binomial model can be applied [4, 5]. 

Another example in the scope of the Bernoulli model is the 3-photon correlation 
technique (see e.g. Chap. 16), where molecular structures are probed by femtosecond 
X-ray pulses. This leads to a high number of images consisting of a few photons 


Bernoulli ' Binomial f Poisson ; Gauss 
STED 7 
© ! A01, A04, A07 
z PALM/STORM : 
v + + 
a A04, A06 ! 
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5 + + 
z i AULD 
FRET/MIET ė : 
A03, A05, B08 

5 near field 
5 B07, C01, C02, C10 
no N 7 
E R 3-photon corr. ! 
a B04 
F ' 
g Far field 
© ar fie 


B10, C01, C02, C10 


Fig. 4.4 Overview over viable model choices for different imaging methods. Projects of the SFB 
755 associated to the respective methods are marked in gray 
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only, out of which only triples are used. Inference based on this sequence of images 
is additionally complicated by rotations of the single target molecules [1]. 

X-ray diffraction imaging also allows for a whole range of models. On first glance 
it seems that a Gaussian model is sufficient, as in total millions of photons are 
collected. However, depending on the specific setup, the photon intensity A may vary 
strongly over the detection region. If imaging is performed in a near-field regime, as 
e.g. in many X-ray microscopy setups, the number of photons in the lower intensity 
regions is about one order of magnitude lower than in the high intensity regions, 
allowing for a Gaussian model. In contrast to this are far field methods where on 
high intensity bins 10* photons can be collected, but in low intensity regions only a 
handful of photons arrives, revealing a Binomial and/or Poisson model more suitable 
[12]. 


Acknowledgements We are grateful to Simon Maretzke, Tim Salditt and Britta Vinçon for several 
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Appendix: Poisson Thinning 


Let u > 0, 7 € (0, 1) and suppose Y ~ P (u), X1, X2,... ~ B (1, n) independent. 
The n-thinning of Y is defined as 


Y 
Y = Dx. 


i=1 


We will now show that the distribution of Y is still Poissonian, but with parameter 
n u. To this end, observe that the probability of Y being k is given by the sum over 
all probabilities of Y being / and exactly k out of the first / X;’s being 1, i.e. 


oo l oo $ 
p[i =] Delr-u20%-8|- Deo =ar 7 =| 


by independence. Inserting the Poisson distribution of Y and the Binomial distribution 
of yi X; gives 


4 Statistical Foundations of Nanoscale Photonic Imaging 141 


oo 


l 
p|ř z k] =) 7 exp (-1) BL a-m'* 


l=k 


et ae DIN Den = 


k)! 
_ exp (=) k 
= (nu)* exp (u (1 — )) 
SUN exp (-un), 


which proves Y ~ P (nu). 


Appendix: Conditioned Poisson Processes 


Suppose we observe a random number N of photons at random arrival times 0 < 
ti <--- < ty <T such that the number of photons between time a and time b is 
Poisson distributed with parameter J i u (t) dt for a fixed function u > 0. Given a 
decomposition of [0, 7] into disjoint intervals /ı,..., Im, denote by 


Hi=#llejeNn|serh. 1<i<m 


the number of photons observed during /;. Assume furthermore that Yı,..., Ym are 
independent. We will now show that the conditional distribution given N = n of 
(Yi, ..., Ym) is multinomial with parameter n and probability vector (p1, ..., Pm) 
where 

fu® dt 

I; 

Diz - 
f w(t) dt 
0 
Therefore letnı,..., nm € No such that Se n; = n. Then we have 


P[Yı =nı,...,Yn =nm, N =n] 
P[N =n] 


_ T exp (- Si, L(t) dr) 4 1o L(t) dr)’ 
exp (= fy) wie) dt) Al w@ ar)” 


P[Yi =n, ..., Ym = nm | N =n] = 


which proves the claim. 
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Chapter 5 A) 
Inverse Problems ga 


Thorsten Hohage, Benjamin Sprung and Frederic Weidling 


The shortest path between two truths in the real domain passes 
through the complex domain. 
—Jacques Hadamard 


5.1 Introduction 


5.1.1 What Is an Inverse Problem? 


Generally speaking, inverse problems typically consist in the reconstruction of causes 
for observed effects. In imaging applications the cause is usually a probe and the effect 
are observed data. The corresponding forward problems then consists in predicting 
experimental data given perfect knowledge of the probe. In some sense solving an 
inverse problems means “computing backwards”, which is usually more difficult 
then solving the forward problem. 

To model these kind of problems mathematically we describe the imaging system 
or experimental setup by a forward operator F: X —> Y between Banach spaces 
X, Y, which maps a probe f € X to the corresponding effect g € Y. Then the inverse 
problem is, given data g € Y, to find a solution f € X to the equation 


F(f)=9. (5.1) 
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5.1.2 Ill-Posedness and Regularization 


A first obvious question to ask is whether or not a probe f is uniquely determined 
by the data g, i.e. if the operator F is injective. Such questions are addressed e.g. in 
Sect. 13.6.6. Given uniqueness, one might try to find the solution f to (5.1) by just 
applying the inverse operator F~!, which gives another reason to call the problem 
anss inverse problem. However in practice this can cause several problems, due to 
the forward operator being ill-posed. 

According to J. Hadamard a problem is called well-posed if the following condi- 
tions are satisfied: 


1. There exists a solution. 
2. The solution is unique. 
3. The solution depends continuously on the data (stability). 


Otherwise the problem is called ill-posed. 

An inverse problem in the form of an operator equation (5.1) is well-posed if F is 
surjective (such that for all g there exists a solution), injective (such that the solution is 
unique) and if F! is continuous (guaranteeing stability). For many inverse problems 
in practice only the third condition is violated, and ill-posedness in the narrower sense 
often refers to this situation: The reconstruction of causes from observed effects is 
unstable since very different causes may have similar effects. 

The remedy against ill-posedness is regularization: To obtain stable solutions 
of inverse problems one constructs a family of continuous operators Ra : Y —> X 
parameterized by a parameter a > 0 converging pointwise to the discontinuous 
inverse F=!: 


Ra © FO, lim RF) = f forall f € X. (5.2) 


We will discuss several generic constructions of such families of stable approximate 
inverses Ra in §5.2. 


5.1.3 Examples 


5.1.3.1 Numerical Differentiation 


In our first example we consider the forward operator given by integration. We fix 
the free integration constant by working in spaces of functions with mean zero, 
12((0, 1) := {f € L?([0, 1): fy f(x) dx = 0}. Let F: L2([0, 1) > L2(I0, 1) 
be given by 


Ff) = [ f(x)dx+e(f),  y € [0,1]. (5.3) 
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where c(f) is such that F(f) € L2. The corresponding inverse problem described 
by the operator equation (5.1) is to compute the derivative g’. If g = F(f), then 
the existence of a solution to the inverse problem is guaranteed and this solution is 
unique. Now assume that instead of the exact data g we are given noisy data g°®* that 
fulfill 


obs 


Ig — gllı: < 6, 


for some small noise level ô > 0. For example this could be one of the functions 
g(x) = g(x) + 6/2 sin(anx). 


As the derivatives are given by (g) (x) = g'(x) + mnd./2 cos(mnx), we have 


IR — gez = and. 

This illustrates the typical ill-posedness of inverse problems: amplification of noise 
may be arbitrarily large for naive application of the inverse F!. This is the main 
difficulty one has to cope with. Our toy example illustrates another typical feature of 
inverse problems: Each function in the image of the operator F defined above is at 
least once differentiable. Also for many other inverse problems the forward operator 
is smoothing in the sense that the output function has higher smoothness than the 
input function, and this property causes the instability of the inverse problem. 


5.1.3.2 Fluorescence Microscopy 


In fluorescence microscopy one is interested in recovering the density f of fluorescent 
markers in some specimen in R?. The probe is sampled by a laser beam, and one 
detects fluorescent photons. In confocal microscopy, spatial resolution is achieved 
by focusing the laser beam by some lense and collecting fluorescent electrons by 
same lense such that out-of-focus fluorescent photons can be blocked by a pinhole. 

Let y € R? be the focal point and assume that the probability (density) that for 
the focus point y € R? a fluorescent photon emitted by a marker at point x € R? is 
detected is k(x — y). k is called the point-spread function, and we assume here that 
it is spatially invariant. Then our problem is described by the operator equation 


90) = F(A) = f k(y — x) f(x) dx, 


i.e. the observation g is given by a convolution of the marker density f with point 
spread function k. As convolution will usually blur an image, the forward operator 
is smoothing. Smoother kernels will lead to stronger smoothing. 


148 T. Hohage et al. 


5.1.4 Choice of Regularization Parameters and Convergence 
Concepts 


Due to the ill-posedness discussed above it is essential to take into account the effects 
of noise in the observed data. Let f t e X denote the unknown exact solution, and 
first assume data g? with deterministic errors such that 


lg? — FF Ylly < ô. (5.4) 


As mentioned above, regularization of an ill-posed operator equation (5.1) with an 
injective operator F consists in approximating the discontinuous inverse operator 
F`! by a pointwise convergent family of continuous operators Ra : Y > X, a > 0. 
This immediately gives rise to the question which operator in the family should be 
chosen for the reconstruction, i.e how to choose the parameter a. Usually the starting 
point of deterministic error analysis in regularization theory is the following splitting 
of the reconstruction error: 


Ro(g®) — fll S Ro(g®) — Rol FCF) + IREF) — FT. (5.5) 


The first term on the right hand side is called propagated data noise error, and 
the second term is referred to as approximation error or bias. Due to pointwise 
convergence (see (5.2)), the bias tends to 0 as œ — 0. Hence, to control this error 
term, we should choose a as small as possible. However, as Ra converges pointwise 
to the discontinuous operator F=', the Lipschitz constant (or operator norm in the 
linear case) of Ra will explode as a — 0, and hence also the propagated data noise 
error. Therefore, œ must not be chosen too small. This indicates that the choice of 
the regularization parameter must be a crucial ingredient of a reliable regularization 
method. Probably the most well-known parameter choice rule in the deterministic 
setting is Morozov’s discrepancy principle: 


apr (ô, g?) := supfa > 0: ||F(Ra(g°)) — g° || < 75} (5.6) 


with some parameter 7 > 1. In other words, among all estimators R,(g°) which can 
explain the data within the noise level (times 7), we choose the most stable one. 


Definition 5.1 A family of operators Ra : Y — X parameterized by a parameter 
a > 0 together with some rule @ : [0, 00) x Y — (0, 00) how to choose this param- 
eter depending on the noise level 6 and the data g° is called a regularization method 
if the worst case error tends to 0 with the noise level in the sense that 


lim sup {|| Ragg — f'llx : 9 EY, lg -FD sf =0 6D 


for all ft € X. 
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The convergence (5.7) is a minimal requirement that one expects from a regularization 
method. However, it can be shown that for ill-posed problems this convergence may 
be arbitrarily slow depending on f?. This is of course not satisfactory. Fortunately, 
a-priori information on the solution ft, which is often available, may help. If we 
know a-priori that ft belongs to some set K C X, then it is often possible to derive 
explicit error bounds 


sup {Il Rav. (9°) — f'llx : 9° € Y, lg? — FP lly < 6} < vO 


for all ft € K with a function Y% € C([0, o0)) satisfying 7)(0) = 0. 

Let us now consider statistical noise models instead of the deterministic noise 
model (5.4). Often statistical data G, belong to a different space Y’, e.g. a space 
of distributions. The distribution depends on some parameter t, and we assume that 
G, > F(f?) in some sense as t — oo. Let t e.g. denote the number of observations 
or in photonic imaging the expected number of photons. As the estimator R,(g°") 
(where now R, is a mapping from Y’ to X) will be a random variable, we have to 
use stochastic concepts of convergence, e.g. convergence in expectation. Other con- 
vergence concepts, in particular convergence in probability are also used frequently. 


Definition 5.2 In the setting above, a family of operators Ra : Y’ — X parameter- 
ized by a parameter a > 0 together with some parameter choice rule @ : [0, 00) x 
Y’— (0, 00) is called a consistent estimator if 


lim E [|| Rave) (Gd — fiz] = 0 


too 


for all ft € X. 


Again one may further ask not only for convergence, but even rates of convergence 
as t — œ on certain subsets EC X. 


5.2 Regularization Methods 


In this section we will discuss generalized Tikhonov regularization, which is given 
by finding the minimum of 


fa € argmin pey [Sym (F(f)) + aR(f)], (5.8) 


where S,o is the data fidelity functional, which measures some kind of distance of 
F(f) and the data g™® and causes the minimizer Ê to still explain the data well, 
whereas R is the penalty functional which penalizes certain properties of the mini- 
mizer. This approach is called variational regularization, as our regularized solution 
is found by minimization of a functional (usually an integral functional). 
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5.2.1 Variational Regularization 


We start with a probabilistic motivation of generalized Tikhonov regularization. 
From here until the end of this section we will consider the finite-dimensional set- 
ting X = R”, Y = R” and use boldface symbols to denote finite-dimensional vec- 
tors/mappings. We start from (5.1), wheref € R”, g € R” and F : R” — R” is some 
injective function. Given the data g we want to find the solution f of F(f) = g, but 
recall that we cannot just apply the inverse F~! as discussed in Sect.5.1. Instead 
we might estimate f by maximizing the likelihood function £(f) = P (gif), i.e. the 
probability that for a certain preimage f the data g will occur. If we assume that our 
data is normally distributed with covariance matrix o7/, then we can rearrange the 
problem by using the monotonicity of the logarithm, as well as the fact, that neither 
additive nor multiplicative constants change the extremal point: 


fac € argmax feg P (g|f) = argmax pcg, log (P(gif)) 
m = F f J 
= argmaX rep log (a elle ex (=e) 


: 1 f 1 
= argmin pepr 208 Fe - FO)? = argmin fegn > lle — F(f)||5. 
i=l 


This demonstrates the well-known fact that the maximum likelihood approach for 
Gaussian noise yields the least squares methods as first used by a Gauss to 
predict the path of the asteroid Ceres in 1801. However, as fur = F~'(g) for g in 
the range of F, this approach has no regularizing effect. In fact it is more reasonable 
to maximize P(f|g) instead of P(g|f), as our goal should be to find the solution 
f which is most likely to have caused the observation g, instead of just finding 
any f which causes the observation g with maximal probability. This leads to the 
Bayesian perspective on inverse problem with the characteristic feature that prior 
to the measurements a probability distribution (the so-called prior distribution) is 
assigned to the solution space X modeling our prior knowledge on f. By Bayes’ 
theorem we have 
P(g) P£) likelihood - prior 


P(f|g) = —-————_ terior = 
(fle) P(g) ey evidence 


Estimating f by maximizing the posterior P(f|g) is called maximum a posteriori 
probability (MAP) estimate. To use this approach we have to model the prior Pf). 
If we assume that f is normally distributed with mean f € R” and covariance matrix 
71, then we find 
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fap € argmax repr P(f|g) = argmax pepr [log (P(g|f)) + log (Pf))] 
m 7 m 


1 g 
= argMIN gern 702 De =; FD? + m 5E = (fo) ;)° 
i=1 j=l 


= argmin per | $llg— FF + SIE — fol 
SS eee 


Ja(f) 


where aœ = a The functional J,,(f) is the standard (quadratic) Tikhonov functional, 
and therefore MAP and Tikhonov regularization coincide in this setting. 

In photonic imaging the data is often given by photon counts, and these are 
typically Poisson distributed in the absence of read-out error (compare Chap. 4). 
Recall that a random variable Z € No is called Poisson distributed with mean A > 0, 
short Z ~ Pois(A), if P(Z = g) = e~*\9/(g!) for all g € No. Hence the negative 
log-likelihood function is given by 


I 


— log P(Z = gl) = À — glog(A) + log g! = A~9 +glo8 (5 


)+C 
where C, is a constant independent of A. Now assume that g is a vector of independent 
Poisson distributed random variables such that g; ~ Pois(F (f);). It follows that the 
negative log-likelihood — log(P(g|f)) = -)., log P(g;|F (f);) is given (up to an 
additive constant independent of f) by the Kullback-Leibler divergence 


m gi 

KL - F(f)) := Fa i>? il a 7 

(g, F(f)) > Ni-8+8 on (| 

If f has a probability distribution P (f) = cexp(-R(f)/T?) this leads to generalized 
Tikhonov regularization 


f, € argmin per [Sg(F(£)) + aR(f)] (5.9) 


with fidelity term Sg = ||g — -||} for normally distributed data, Sg = KL(g, -) for 
Poisson distributed data and the penalty term aR with regularization parameter 
a = T~? for Poisson data and a = 07/7? for Gaussian white noise. 

Note that in the Bayesian setting above the regularization parameter a is uniquely 
determined by the prior distribution and the likelihood functional. However, often 
only qualitative prior knowledge on the solution is available, but the parameter 7 
is unknown. Then a has to be determined by a-posteriori parameter choice rules 
analogous to the discrepancy principle (5.6) for deterministic errors. 

Let us discuss a few popular choices of the penalty functional R, which allows the 
incorporation of prior information on the solution or is simply the negative logarithm 
of the density of the prior in the above Bayesian setting. For a-priori known sparsity 
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of the solution one should choose the sparsity enforcing penalty R(f) = |If |ı. If f 
is an image with sharp edges, then the total variation seminorm is a good choice of 
R. However, we point out that for the total variation seminorm in a Bayesian setting 
there exists no straightforward useful infinite dimensional limit (see [16]). Bayesian 
prior modelling in infinite dimensional settings is often considerably more involved. 

If f is a probability density, a frequent choice of the penalty functional is 
R) = KL, fo), which naturally enforces nonnegativity of the solution because 
of the logarithm. Alternatively, more general inequality constraints N (f) < 0 for 
some function N can be be incorporated into the penalty function by replacing R by 


RE, if Nf) <0 


Rt) = 
00, else. 


5.2.1.1 Implementation 


In this paragraph we will discuss several possibilities to compute the minimizer A 
of (5.9) for a linear forward operator denoted by F = T. In the case of quadratic 
Tikhonov regularization it follows from the first order optimality conditions that the 
Tikhonov functional J, has the unique minimizer 


fa = (T*T + a1)! (T*g + afo) (5.10) 


for all œ > 0. So in order to compute the regularized solution we have to solve the 
linear system of equations 


Af=b with A:=T’T+alandb = T*g + afo. 


Solving this directly by for example Gauss-Jordan elimination requires O (n?) oper- 
ations and we have to store the full matrix A. In imaging applications n is typically 
the number of pixels or voxels, which can be so large that storing A is impossible. 
Therefore, we have to resort to iterative methods which access A only via matrix- 
vector products. Such matrix-vector products can often be implemented efficiently 
without setting up the matrix A, e.g. by the fast Fourier transform (FFT) or by solving 
a partial differential equation. As A is positive definite, the most common method 
for solving Af = b is the conjugate gradient (CG) method: 


Algorithm 5.1 (CG iteration for solving Af = b, A > 0) 
Initialization. Choose initial guess fọ € R”. Set so = b — Afo; do = So. 
General Step (/ = 0, 1,...) 


5 Inverse Problems 153 


‚si 
(d;, Ad;) 
fii. =f + yd) 
S41 =Sı + Yı Ad; 
lsali 
[isi 


dir = S741 + Bid). 


If A = T*T + al as in Tikhonov regularization, the stopping criterion s 4 0 may 
be replaced by 


TOL 
Isl > — 
Q 


for some tolerance parameter TOL > 0. It can be shown that s; = b — Af, 
for all /. As || A=!|| < 1/a, this guarantees the error bound If -f,|| < TOL to the 
exact minimum f = A=!b of the Tikhonov functional. 

In the case of more general data fidelity terms Sg and penalty terms R one can 
use a primal-dual algorithm suggested by Chambolle and Pock [3]. To formulate this 
algorithm, recall that for a functional F : R” — RU {oo} the conjugate functional 
is given by 


F*(s) := sup [(s, x) — F(x)]. (5.11) 


xeR” 


If F is convex and continuous, then #** = F. For more information on this and 
other basic notions of convex analysis used in the following we refer, e.g. to [17]. 
The algorithm is based on the saddle point formulation 


min max KTE, Z) + aR(£) — S; (z)] = max min KTE, z) + aR(£) — S; (z)| . 


Note that an analytic computation of the maximum leads to the original problem 
(5.9) whereas a computation of the minimum leads to the dual problem 


1 
max — E (z) + aR* (-=r)| (5.12) 
zeR” Q 
The algorithm requires the computation of so called proximity operators (see 
Chap. 6). For a functional G: R” — R and a scalar \ > 0 the proximity operator 
prox, \: R” — R” is defined by 
Proxg \ = (Z) = argmin yeg [illz -— x||ĝ + AG (x)]- 


For many popular choices of Sg and the proximity operator can either be calculated 
directly via a closed expression or there are efficient algorithms for their computation. 
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To give asimple example, for G(x) = 5 Ilx II3 one can calculate the proximity operator 
just by (5.10) to be (1 + A) !z. 

One also needs to evaluate the proximity operator of the Fenchel convex conjugate 
G*, which can be done by Moreau’s identity 


prox» A(Z) = Z — Aproxg 1/) (32) - 


Algorithm 5.2 (Chambolle-Pock primal dual algorithm) 
Initialization. Choose (fo, po) € R” x R” and set fo = fo. 
General Step (k = 0, 1, . . .) Choose parameters TÉ, TĚ > 0, 6% € [0, 1] and let 


Pr+1 = PrOXs, 7t (Pr + TÉ Tf) 
fk+1 = PLOX yp, rk (fr — TyT* px) 


fest = Ok feo — fo) . 


For constant parameters T$, TĚ > Oand 6% = 1ithas been shown [3] that f, converges 
to a solution of (5.9) and p; converges to a solution of the corresponding dual problem. 
Under certain assumptions on Sg and R special choices for the parameters 7 Te >0 
and 9, will speed up the convergence. 

To compute the minimizer of (5.9) with additional inequality constraints one can 


apply semismooth Newton methods for which we refer to [19]. 


5.2.2 Iterative Regularization 


Whereas in the previous subsection we have discussed iterative methods to com- 
pute the minimum of a generalized Tikhonov functional, in the following we will 
discuss iterative methods for the solution of F (f) = g without prior regularization. 
Typically the choice of the stopping index plays the role of the choice of the regu- 
larization parameter a. A motivation for the use of iterative regularization method is 
the fact that the Tikhonov functional is not convex in general for nonlinear operators. 
Therefore, it cannot be guaranteed that an approximation to the global minimum 
of the Tikhonov functional can be computed. For further information on iterative 
regularization methods we refer to the monograph [15]. 


5.2.2.1 Landweber Iteration 

Landweber iteration can be derived as a method of steepest descent for the cost 
functional J (f) = Al F(f) - gl}: As the direction of steepest descent is given by the 
negative gradient —J’(f) = — F’(f)* (F (£) — g), this leads to the iteration formula 


fin = fk — WF" E)" (F fi) — 8) (5.13) 
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with a step size parameter u. This parameter should be chosen such that 
LII F'O F’(£)|| < 1 for all f. Since a too small choice of u slows down conver- 
gence considerably, it is advisable to compute the operator norm of F’(f)* F’(f) by 
a few iterations of the power method. 

Under certain conditions on the operator F it has been shown in [7] in a Hilbert 
space setting that Landweber iteration with the discrepancy principle as stopping 
rule is a regularization method in a sense analogous to Definition 5.1, i.e. the worst 
case error tends to 0 with the noise level for a sufficiently good initial guess. 


5.2.2.2 Regularized Newton Methods 


Although Landweber iteration often makes good progress in the first few iterations, 
asymptotic convergence is very slow. Faster convergence may be expected from 
Newton-type methods, which solve a linear system of equations or some minimiza- 
tion problem using the first order Taylor approximation 


F(f) © Ff) + F EOE — fi). (5.14) 


around a current iterate f. Plugging this approximation into the quadratic Tikhonov 
functional and using the last iterate as initial guess leads to the Levenberg-Marquardt- 
algorithm 


for = argmin per [FIE EDE — fe) + FE) — gli + SIE — fell]. 


For a convergence analysis we refer to [6]. The minimization problems can be solved 
efficiently by Algorithm 5.1 without the need to set up the full Jacobi matrices F’ (fx) 
in each step. Newton-type methods converge considerably faster than Landweber 
iteration. In fact the number of Landweber steps which is necessary to achieve an 
accuracy comparable to k Newton steps increases exponentially with k (cf. [4]). On 
the other hand, each iteration step is more expensive. Which of the two methods is 
more efficient may depend on the size of the noise level. Newton-type methods are 
typically favorable for small noise levels. Plugging the first order Taylor approxi- 
mation (5.14) into the generalized Tikhonov functional (5.8) leads to the iteration 
formula 


fir € argmin ¢ [Ss (Fi) + FEDE — f) + aR(f)]- (5.15) 


If Syos(g) = 311g — g°PS||? and R(f) = If — foll, this leads the commonly used 
iteratively regularized Gauss-Newton method where compared to the Levenberg- 
Marquardt method the penalty term is replaced by %||f — fo||5. Note that if Sooos 
and R are convex, a convex optimization problems has to be solved in each Newton 
step, which can be done, e.g., by Algorithm 5.2. For a convergence analysis of (5.15) 
including the case of Poisson data we refer to [12]. 
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5.3 Error Estimates 


5.3.1 General Error Bounds for Variational Regularization 


Under the deterministic noise model (5.4) we are now looking for error bounds of 
the form 


IÊ — FI? < 6,0) (5.16) 


for the Tikhonov estimator f. In a second step the right hand side may be mini- 
mized over a. As discussed in Sect. 5.1.4 such estimates can only be obtained under 
additional conditions on f', which are called source conditions. There are several 
forms of such conditions. Nowadays, starting with [8] such conditions are often for- 
mulated in the form of variational inequalities. For the sake of simplicity we confine 
ourselves to the case of Hilbert spaces with quadratic functionals R and S,» here, 
although the concepts can be generalized with little additional effort to general con- 
vex R and S,o». For a concave, monotonically increasing and continuous function 
w: [0, 00) — [0, ©) with (0) = 0 we require that 


1 = 1 | Eager i 
vreX gif- RS SIF zus +y MEAO- FPP). 6.17) 
Such conditions are not easy to interpret at first sight, and we will come back to this in 
Sect.5.3.2. However, they have been shown to be necessary for certain convergence 
rates of Tikhonov regularization and other regularization methods (see [11]), and 


sufficiency can be shown quite easily: 


Theorem 5.3 If f fulfill (5.17), then the Tikhonov estimator A in (5.8) satisfies 
the error estimate 


1 f _ gr 6 * =) 
„fa F< aS w) ( re (5.18) 


where w*(t) := SUP,>0 (st _ w(t) denotes the conjugate function (see (5.11)). 


Proof By definition of f and our noise model we have 


1 ry obs? 5, % 2 2 l + ward ae 
5 llF Cfo) = le = sl? 9 5 Sali is 
ZI — 9? + ZU < SFO) — oP + NS >t SIs 


so together with our assumption (5.17) we find 


A 


Tie _ 5 lizen! t2 EN _ 12 
alfa- Ft? < SIAP- zur? +y (IFA) — FMI?) 
6° 1 F obs 12 a Fy 2 
< 52 FI +y (IFG — FIP) 


By the parallelogram law we have for all x, y, z € X that 
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lx — yl? < 21x - zl? + 2112 — yl? — Ilx + y — 22|7 < 2x - zl? + 212 — yl. 
Apply this withx = FÊ, y = F(fÝ) and z = g™ to find 

1 fa Ty y2 6? 1 ra obs |) 2 

le een rs 

4a 2a 2a 


so that finally we have 


lis got l 2 i N ; 
sl fa Fr <= SIF fa) = FAN + (Fa) — FEN) 
a 4a 


& 5 & : 1 
s +sup| = +40] = a p) ( =) m 


a s>0 4a 
The two most commonly used types of functions ~ in the literature, 
w(t=t’? and YE) = (— logt)?(1+0(1)), ast > 0 


are referred to as Holder and logarithmic source function, respectively. For these 
functions we obtain 


(=a 
ep (-,)=e 


(y (-7) = (—logt)’(1+0(1)), ast > 0. 


Note that the two terms on the right hand side of (5.18) correspond to the error 
splitting (5.5). The following theorem gives an optimal choice of a balancing these 
two terms: 


Theorem 5.4 If under the assumptions of Theorem 5.3 w is differentiable, the infi- 
mum of the right hand side of (5.18) is attained at (6) if and only if 


1 oh 2: 
fay = YA"): 


and 


1 + 
a1 Fr < y (48). 


Proof Note from the definition of (-ıb)* that (—w)*(t*) > tt* + w(t) for all t > 0, 
t* € R. Further equality holds true if and only if t* = —w’(t), as for this choice of 
t the concave function f + ft* + ıb(f) attains its unique maximum at ¢ and thus in 
particular tt* + Y(t) > (-ıb)*(t*). Therefore we have with t = 48° and t* = -5 
that 
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2 


2 +( w ( L) = was for alla > 0 
Q 4a 


and Ë + (-)* (— +) = (462) if and only if — 2 = -% (48°). o 


It can be shown that the same type of error bound can be obtained by a version of 
the discrepancy principle (5.6) which does not require knowledge of the function w 
describing abstract smoothness of the unknown solution f* [2]. This is an advantage 
in practice, because such knowledge is often unrealistic. 


5.3.2 Interpretation of Variational Source Conditions 


5.3.2.1 Connection to Stability Estimates 


Variational source conditions (5.17) are closely related to so called stability estimates. 
In fact if (5.17) holds true for all ff eK C X, then all fı, fo € K satisfy the stability 
estimate 


1 
gli = Al? < y WEGO - FDI), 


with the same function Y, since one of the terms + (I fill? — Il fall’) will be non- 
positive. There exists a considerable literature on such stability estimates (see e.g. 
[1, 14]). However it is unclear if stability estimates also imply variational source 
conditions as two difficulties have to be overcome. Firstly the term || f ||? — || f* I? 
might be negative and secondly one would have to extend the estimate from the set 
K to the whole space X. 


5.3.2.2 General Strategy for the Verification of Variational Source 
Conditions 


In general the rate at which the error of reconstruction methods converges to 0 as the 
noise level tends to 0 in inverse problems depends on two factors: The smoothness of 
the solution f* and the degree of ill-posedness of F. We will describe both in terms 
of a family of finite dimensional subspaces V,, C X or the corresponding orthogonal 
projections P, : X — V,. The smoothness of f* will be measured by how fast the 
best approximations P, f* in V, converge to f!: 


I = Pr) Ff" IIx < Kn (5.19a) 
lim kn = 0 (5.19b) 
noo 


Inequalities of this type are called Bernstein inequalities, and they are well studied 
for many types of subspaces V,, such as spaces of polynomials, trigonometric polyno- 
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mials, splines, or finite elements. We will illustrate this for the case of trigonometric 
polynomials below. 

Concerning the degree of ill-posedness, recall that any linear mapping on a finite 
dimensional space is continuous. Therefore, a linear, injective operator T restricted 
to a finite dimensional space V, has a continuous inverse (T |y, )~! defined on T (V,). 
However, the norm of these operators will grow with n, and the rate of growth 
may be used to measure the degree of ill-posedness. In the nonlinear case we may 
look at Lipschitz constants o,, such that || P, f' — P, f || < on||F (Pa FÌ) — F(P, P). 
However, to obtain optimal results it turns out that we need estimates of inner products 
of P, ft — P, f with f*. Moreover, on the right hand side we have to deal with 
F(f') — F(f) rather than F (P, ft) — F(P, f): 


(Bft, f' f) < FC) - Fly (5.190) 


The growth rate of o„ describes what we will call local degree of ill-posedness of F 


at ft. 


Theorem 5.5 LetX and Y be Hilbert spaces and suppose that there exists a sequence 
of projection operators P,: X —> X and sequences (Kn)neN, (On)nen Of positive 
numbers such that (5.19) holds true for all n € N. 

Then ft fulfills a variational source condition (5.17) with the concave, continuous, 
increasing function 


YC) = inf [onVT + s], (5.20) 
which satisfies (0) = 0. 


Proof By straightforward computations we see that the variational source condition 
(5.17) has the equivalent form 


vexe PT N A FP +0 IF) — FOO). 


Using (5.19a), (5.19c), and the Cauchy-Schwarz inequality we get for each n € N 
that 


(Ft, F -—f)=(Paft, fi - f\+l(a— Patt, fT f) 
< nll EGD — FEl + Kall ft - Flix 


a l _. 
<onllF(f*) — Fly +K? + = fik 


x 1 ei 
< on FF!) — FD lly + m + Gilt — FP. 
Taking the infimum over the right hand side with respect to n € N yields (5.20) with 
T=||F(f})-F(f) I3. As Ņ is defined by an infimum over concave and increasing 
functions, it is also increasing and concave. Moreover, (5.19b) implies Y(0) = 0. O 
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5.3.2.3 Example: Numerical Differentiation 


Recall that the trigonometric monomials e, (x) := e?™* form an orthonormal basis 
fen: ne Z\ {O}} of LŽ ([0, 1]), i.e. every function f € L2([0, 1]) can be expressed as 


f@)= Ž nezo fmen (x) with fin) = i Ff (x)en (x) dx. Further note that from 
the definition (5.3) of the forward operator we get 


1 
F(en) = Onin” neZ \ {0}. 


and that the kth derivative of f has Fourier coefficients Fon) = (?ri n) fin). 
Therefore, the norm 


1/2 


Ile =| I, Om? |f? 


neZ\ {0} 


(called Sobolev norm of order s > 0) fulfills || f || ax = || f® ||z2 for k € No, but it 
also allows to measure non-integer degrees of smoothness of f. 
We choose P, as the orthogonal projection 


Paf := X fonen. 


0<|m|<n 


Suppose that the kth distributional derivative belongs to L2 ([0, 1]). Then 


u ren 2 
rm) 


Ia- Paste I | m] = E eam 


|m|>n Im|>n 


rn” ft 


IA 


which shows that (5.19a) and (5.19b) are satisfied with kn = (2rn)”*. Moreover, 
we have that 


i) xe ~ 1 ya A 
(ff -r= % min (5 (Fim - Fw) 


O<|n|<m 
<( E mar) [ran Fin] 
O0<|n|<m 


< Arm) || fla: 


F(t") = FO). 


for s € (0, 1], so (5.19c) is satisfied with cp := (21m)'~*. Choosing the parameter 
m © || fifo? 7/2049 at which the infimum in (5.20) is attained approximately 
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we see that there exists a constant C > O such that the variational source condition 
(5.17) is satisfied with Y(t) = CI f" Ba and from Theorem 5.4 we obtain 
the error bound 


fa — fle = 0 (5). (5.21) 


It can be shown that this rate is optimal in the sense that there exists no reconstruction 
method R for which 
<ð | =0 (5*) : 


inf sup [| Rg) = f|: fleH, IF) = 9 


5.3.2.4 Example: Fluorescence Microscopy 


Similarly one can proceed for the example of fluorescence microscopy. As one 
has to work here with L? spaces rather than LŽ spaces, the Sobolev norm is 


defined by || f Ea = Venez + wy nl”. Assuming that the convolution kernel 
is a-times smoothing (a > 0) in the sense that || F(f) || a« ~ || f || z0, which is equiv- 
alent to the existence of two constant 0 < c < C such that the Fourier transform of 
the convolution kernel k fulfills 


e(l + 19? < ROI CA + JE)? 


one can show that || f"|| 7» < © for s € (0, a] implies a variational source condition 
with Y(t) ~ t=: and an error bound 


lÊ- fille =O (5%) (5.22) 


Again this estimate is optimal in the sense explained above. 


5.3.2.5 Extensions 


Variational source conditions with a given Hölder source function actually hold true 
on a slightly larger set. In the typical situation where the marker density of the 
investigated specimen is constant (or smooth) up to jumps, then it fulfills the same 
variational source condition with s = 1/2, although f* € H" if and only if s < 1/2. 
The sets on which a variational souce condition is satisfied can be characterized in 
terms of Besov spaces B3 ,,, and bounded subsets of such spaces are also the largest 
sets on which a Hölder-type error bound like (5.21) and (5.22) are satisfied with 
uniform constants (see [11]). 

In the case where the convolution kernel is infinitely smoothing, e.g. if the kernel 
is a Gaussian, then we cannot expect to get a variational source condition with a 
Holder source function under Sobolev smoothness assumptions. Instead one obtains 
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logarithmic source functions pee as introduced above, which will again be optimal 
and lead to very slowly decaying error estimates as ô N 0 [11]. 

Note that the rates (5.21) and (5.22) are restricted to smoothness indices s € 
(0, 1] and s € (0, a], respectively. This restriction to low order rates is a well-known 
shortcoming of variational source conditions. Higher order rates can be obtained by 
imposing a variational source condition on the dual problem (5.12), which can again 
be again by verified by Theorem 5.5 (see [5, 18]). 

With some small modifications the strategy in Theorem 5.5 can be extended to 
Banach space settings [9, 21] and nonlinear forward operators F', in particular those 
arising in inverse scattering problems [10, 20]. 


5.3.3 Error Bounds for Poisson Data 


We already briefly discussed discrete versions of inverse problems with Poisson 
in Sect.5.2.1. Such problems arise in many photonic imaging modalities such as 
fluorescence microscopy, coherent x-ray imaging, positron emission tomography, 
but also electron microscopy. In the following we briefly discuss a continuous setting 
for such problems. 

We consider a forward operator F : X— Y that maps an unknown sample f’ € X 
a photon density gt € Y = L!(M) generated by ft on some measurement manifold 
MC R“. The given data are modeled by a Poisson process 


N 
G, = >: Ox, ’ 
k=1 
with density rg’. Here {x1, . . . , xy} C M denote the positions of the detected photons 


and’ > 0 can be interpreted as exposure time. Note that t ~ E(N), i.e. the exposure 
time is proportional to the number of detected photons. 

Now to discuss error bounds we first need some notion of noise level. But what is 
the “noise” in our setting? Our data G, do not belong to Y = L! (M) and the “noise” 
is not additive. However, it follows from the properties of Poisson processes that 


E[(+G,, h)] aj hg’ dx, 


M 


. 1 . . 1 . 1 
and the variance of (;G,, h) is proportional to +. This suggests that T plays the role 
of the noise level. More precisely, it is possible to derive concentration inequalities 


P (ace, g’) > =) < exp (-cr) 
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where the distance function d is defined by a negative Besov norm (see [13, 22] for 
similar results). 

As a reconstruction method we consider generalized Tikhonov regularization 
as in (5.8) with Sg, given by a negative (quasi-)log-likelihood corresponding to 
the Poisson data. As discussed in Sect.5.2 this amounts to taking the Kullback- 
Leibler distance as data fidelity term in the finite dimensional case, and in particular 
in the implementations of this method. (Sometimes a small shift is introduced in 
the Kullback-Leibler divergence to “‘regularize” this term.) By assuming the VSC 
(5.17) with (|| Ff") — F(®)||°) replaced by Y (KL(F (£t), F(£)))) and an optimal 
parameter choice & one can than show the following error estimate in expectation 


B(\fk-f'lx)=0(v(G)). e 
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Chapter 6 A) 
Proximal Methods for Image Processing geit 


An Introduction to ProxToolbox for Testing Algorithms 
on the Göttingen Datasets 


D. Russell Luke 


Was ist dann eigentlich ein Hilbertischer Raum? 
— David Hilbert to John von Neumann [1] 


6.1 All Together Now 


A major challenge in building and maintaining collaborations across disciplines is to 
establish a common language. Sounds simple enough, but even a common language 
is not helpful without acommon understanding of the basic elements. When it comes 
to the day-to-day exchange of data and software, this means building acommon data 
management and processing environment. Try to do this, however, and you learn very 
quickly that even for something as concrete as building software that everyone can 
use, there are different ways of interpreting and understanding what the software does. 
In the context of X-ray diffraction, for instance, what a physicist might understand 
as a software routine that simulates the propagation of an X-ray through an optical 
device, a mathematician would understand as an operator with certain mathematical 
properties. 

The first successful algorithms for phase retrieval were developed and understood 
by physicists as iterative procedures that simulate the forward and backward prop- 
agation of a wave through an optical device, where in each iteration the computed 
wave is adjusted to fit either measurement data or some experimental constraint, 
like the shape of the aperture or the illuminating beam. Later, mathematicians rein- 
terpreted these operations in terms of the application of projectors to iterates of a 
fixed point mapping. Of most recent vintage is an effort by a new generation of 
applied mathematicians to sidestep the more interesting aspects of the physicists’ 
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algorithms—namely that they occasionally don’t work—by lifting the problem to 
a space that is too high-dimension for any practical purposes, and then relaxing 
the underlying problem to something with theoretically nicer properties, but whose 
solution bears little meaningful relationship to the problem at hand. At this point, 
one is reminded of Richard Courant’s lamentation, “the broad stream of scientific 
development may split into smaller and smaller rivulets and dry out.” [2] Here’s to 
swimming against the current. 


6.1.1 What Seems to Be the Problem Here? 


The story of computational phase retrieval in X-ray imaging is a perfect example of 
how different communities can come to understand the same things in very different 
ways. This serves as a sort of origin story for the ProxToolbox [3] 


http://num.math.uni-goettingen.de/proxtoolbox/ 


which is the subject of this chapter. The setting for phase retrieval has been presented 
in Chap.2 and this will be the stuff cooked in the mathematical crucible that the 
ProxToolbox represents. 

The measurements are intensity readings which are denoted simply as a 
nonnegative-valued vector 7, with n elements corresponding to the pixels in the 
CCD array (see 2.12). The model for these measurements is developed in Sect. 2.1. 
The various modalities (near field/far field) have the general form 


[PFU Eyl] ST k= 1,2, n GS 1, 2423.52) (6.1) 


In (2.12) the model is given in the continuum with the intensity I(x, y) at the 
position (x, y) in the measurement plane. The actual measurements consist of pixels 
indexedbyk = 1,2,...,n,andj = 1,2, ...,ny corresponding to positions (xg, yj) 
in the measurement plane. The model then represents a system ofn = n,n, equations 
in 2n unknowns, the real and imaginary part of Dp (~w) at each of the n pixels. With 
this discretization, the mapping Dpr is understood as a discrete Fresnel propagator 
with Fresnel number F (see 2.84). To make things simpler, the indexes k and j are 
combined uniquely into a single index i = 1, 2,..., so that the data model is just 


[Pr | =v, Vi=1,2,...,n. (6.2) 


The solution to the phase retrieval problem as presented here is the complex-valued 
vector ı) € C” that satisfies (6.2) or, more generally (6.3). With only a single measure- 
ment the problem is underdetermined, that is, too many unknowns and not enough 
equations. To further constrain the problem, there are a number of possibilities. 
First, one could (and should) include a priori information implicit in the experiment 
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(support, nonnegativity, sparsity, etc). The next thing one could try is to adjust the 
instrument in some known fashion and take several measurements of the same object. 
The mapping Dpr is the model for the mapping of the electromagnetic field at the 
object to the field where the intensity is measured, either near field or far field. For 
the purpose of this tutorial, these are mappings from vectors in C” to vectors in C”. 
This can be modified in a number of different ways. Representing n-dimensional 
complex-valued vectors instead as two-dimensional vectors on an n-dimensional 
product space, the phase retrieval problem is to find Y = (Y1, Y2, ..., Un) € (R?)" 
(Yi € R?) satisfying (6.2) for all i in addition to qualitative constraints and/or addi- 
tional measurements. ! 

Ptychography was briefly mentioned in Chap.2. Ptychography is harder to say 
than to describe. A quilt is a ptychogram of sorts. Or put in more technical terms, 
ptychography is a combination of blind deconvolution and computed tomography 
for phase retrieval. The original idea proposed by Hegerl and Hoppe [4], was just 
the computed tomography part: to stitch together the original object Y% from many 
measurements at different settings of the instrument, modeled by Dj, the j indexing 
the setting. One of the implicit complications of conventional ptychography, which 
differs from the original is that the illuminating beam is also unknown—this is the 
blind deconvolution part. 

In the above, different Fresnel numbers correspond to collecting the intensities 7; 
at different planes orthogonal to the direction of propagation. This further constrains 
the problem. The model is only a little more involved than simple phase retrieval (6.2). 
For m different intensity measurements, each consisting of n pixels, the generalized 
ptychography/phase diversity model takes the form 


(Dj), l= Jj Vi=1,2,...,0, V7 =1,2,...m. (6.3) 


This fits the first ptychographic reconstruction procedure [4] which assumed that 
the illuminating beam, characterized by D;, was known. In the blind ptychography 
problem — analogous to the blind deconvolution—the beam is not completely known. 
This corresponds to what is commonly understood by ptychography in modern appli- 
cations [5-8]. To account for the unknown beam characteristics, the mapping D; is 
further decomposed: 

Dj(z, v) = F (S;(@) © %) (6.4) 


Here F is a parameter-free propagator and S; : C” — C” denotes the j-th linear 
operator representing some known adjustment to the beam—a lateral shift, or trans- 
lation in the direction of propagation—z € C” is the unknown vector characterizing 
the probe, and © is the elementwise Hadamard product. 

The problem in blind ptychography is to reconstruct simultaneously the object 
w and illuminating beam u from a given ptychgraphic dataset. Near-field, or in-line 


'The issue of whether to represent the vectors as points in R?” or points in C” is notational. The 
representation as points in C” is more convenient for the purpose of explanation, but on a computer 
you will need to work with R”. 
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ptychography [9] involves moving the imaging plane along the axis of propagation 
of the beam (i.e., away from the plane where the object lies). This is very similar to 
phase diversity in astronomy [10], but there one changes the defocus in the far-field 
instead of the imaging plane in the near field (also, one does not have to recover the 
beam in astronomy applications). Mathematically, however, the two instances have 
the same structure. In conventional far-field ptychographic experiments the beam is 
much smaller than the specimen. The different measurements consist of scans of Y 
in the lateral direction with sufficient overlap between successive images. Lateral 
translations and translations along the axis of propagation were combined in [11]. 
Note that the last case is least restrictive in terms of probe properties, see also [12] 
for a detailed comparison and discussion. 

The issue of existence of a ıb that satisfies all the equations is discussed in Chap. 23. 
For this chapter, existence is recast as consistency of the measurements and the 
physical model. The data is exact.” What is not entirely accurate is the model for the 
data and the computational bandwidth, e.g. finite precision arithmetic. 

Though it might seem unintuitive, for algorithmic reasons, it is better to separate 
the aspects of the imaging model having to do with the field at the object plane from 
those having to do with the field at the image plane. Denote u = (u1, u2, ..., Um) 
with u; € C” (j = 1,2,..., m) and define the measurement sets 


M;= fu ec" | |(Fw;| = VT;;, (= 12.22 G=12...,m) (65) 
where F is a parameter-free propagator in (6.4). The sets M; are nothing more than 
the phase sets, or the set of all vectors that could explain the data. Solutions to the 


most general physical model represented by (6.3) consists of any triple of vectors 
(z, Y, u) € C” x C” x (C”)” in the set 


M = { z, Y, u) € C x C x C" IS, JOy=u,, j=1,2,...,m} (6.6) 


suchthatu; € M; andthe beam z and object 7) satisfy any other reasonable qualitative 
constraints. The constraints on the unknowns z, 7) and u are separable and given by 


X = {qualitative constraints on the probe}, (6.7a) 
O = {qualitative constraints on the specimen}, (6.7b) 
M = M, x M, x ---x Mm, (measurement constraints). (6.7c) 


As before, the qualitative constraints characterized by X and O are support, support- 
nonnegativity or magnitude constraints corresponding respectively to whether the 
illumination and specimen are supported on a bounded set, whether these (most 
likely only the specimen) are “real objects” that somehow absorb or attenuate the 


This is a minority opinion, but it seems to be the height of hubris to think that an empirical 
observation is an approximation to a theoretical model rather than the other way around. The only 
thing that is indisputable is that the instrument behavior and the predicted behavior don’t match up 
as well as desired. 
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probe energy, or whether these are “phase objects” with a prescribed intensity but 
varying phase. A support constraint for the set X, for instance, would be represented 
by 


X= kei. Zn) € C” | zls R and fori ¢ Ix zi =0}, (6.8) 


where Ix is the index set corresponding to which pixels in the field of view the probe 
beam illuminates and R is some given amplitude. 


6.1.2 What Is an Algorithm? 


One of the best known algorithms for phase retrieval is Fienup’s Hybrid Input-Output 
algorithm (HIO) [13] discussed in 2.88. This algorithm illustrates just about every 
difficulty one encounters in collaborating across disciplines, and sets the stage for 
the rest of the tutorial below. In the present setting, HIO with a support constraint is 
given as 


(Pu); ifi € D; 


_ kH _ 
(Wk E N)(Wi = 1,2,...,0) Yi vt — Bx(Pu(*)),, otherwise. 


(6.9) 


Here N is the set of counting numbers, D indicates an index set where it is imagined 
that some constraints are satisfied; the mapping Py fits the current iterate Y% to the 
data by propagating this guess through a model optical system, fixing the amplitude 
at the measurement surface to match the observed intensity and then propagating 
the resulting field back to the plane of the object. Putting (2.84) in into the present 
context, with the set M consisting of just a single intensity image, M = M,, yields 


D 
ey ee (Dr); 


= Pu) = Dr), © [Oram] 


(6.10) 


The symbol ~ indicates that this is not really and equivalence relation: for the moment 
think of it as equivalence with exceptions. 

One obvious exception is when (Dr (w)); = 0. If you are a physicist you might 
argue that (Dr(w)); = 0 on a set of measure zero—a fancy way for saying never, 
with infinite precision arithmetic. Except that electronic processors operate with 
finite precision arithmetic and zero is therefore enormous on a computer. In fact, with 
double precision, zero is not smaller than le — 16. To see what kind of error this can 
lead to, suppose that (Dr(w)); = —le — 15 with infinite precision arithmetic, but 
because of roundoff, the computer returns le — 15. A very small difference locally. 
But suppose that, at this pixel, the measured intensity is /; = 10. In computing the 
projection, the computer returns a point with y; = 10 instead of y; = —10. This 
makes an enormous difference. The typical user won’t see this kind of error often, 
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but in numerical studies in [14] it happened about 12% of the time, which is not 
insignificant. Anyone with experience in programming knows to be careful about 
dividing. Without thinking much more about it, one usually codes some exception 
to avoid problems when division gets too dicey. 

But there is another reason to pay attention to this exceptional point: convexity. 
The mathematical understanding of this and other phase retrieval algorithms origi- 
nating from the optics community begins with viewing the entire collection of points 
satisfying the data and the qualitative constraints as sets and then viewing the opera- 
tions in the above iterative procedures as metric projections onto these sets. A metric 
projector of a point 7) onto a set C is simply an operator that maps x to all points in 
C that are nearest to Y. The operation Py in (6.10) was long called a projection in 
the optics literature, but it was not shown to be a metric projector until [14, Corollary 
4.3] where it was pointed out that the projector is set-valued in general. Being more 
careful one should write 


Dr); . 
VT if (Dr); #0, 


(6.11) 
JIS, if (Dr(w)); = 0. 


by € Pu) = [7,0 a i € | 


The symbol S denotes the unit sphere in the complex plane (R?) and ./7;S is the 
sphere of radius ./7;. The symbol € is a reminder that the right hand side is a set of 
elements, and the left hand side is as a selection—any selection—from this set. The 
change in notation from “=” to “€” is not just pedantic nit-picking, but underscores 
the fact that the problem is a nonconvex, meaning that you can find two 
points in the set where the line segment joining the two points leaves the set. In (6.11) 
take two points on opposite ends of the sphere ./7;S: the line joining them is not in 
/1;S. If the sets M were convex (line segments joining any two points in the set are 
contained in said set), then the projector would be single-valued and one could forget 
the whole technicality, not to mention any worries about numerical instability. 

Returning to iterative algorithms just in the context of phase retrieval (the probe z 
is known and there is only a single measurement M = M; so that the variables u are 
not needed), the procedure (6.9) is a natural way to think of a numerical procedure 
when approaching things physically: one makes a guess for the object w°, propagates 
it through the optical system according the model given by (6.10) and updates this 
guess to ab! depending on whether the elements satisfy the data and some a priori 
constraint. And repeat. The user would stop the iteration either when he needed to go 
for coffee or when the iterates stop making progress, in some loosely defined way. 
The subtle point here is that (6.9) is not a fixed point iteration, but rather a simulation 
of a physical process. 

Bauschke, Combettes and Luke [15] showed that the HIO algorithm (6.9) is 
equivalent to 


UT € 3 (Re(Ru + (Ar — DPm) + Id +0 — &)Pu)). (6.12) 
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where Id is the identity mapping (does nothing to the point), G is the set of vectors 
satisfying a bounded support constraint, and 


Ru =2Pu—Id and Re =2Pe-Id (6.13) 
with 
p; fjeD 
Ped;=1 v? (6.14) 
0 else. 


The mappings Re and Ry are called reflectors because they send points to their 
reflection points on the opposite side of the set onto which one projects. The iteration 
(6.12) is the Hybrid Projection Reflection (HPR) algorithm proposed in [15]. When 
GB, = 1 for all n, then the iteration takes the form [16] 


vw! € S(Re(Ru + Id) ^). (6.15) 


This is the popular Douglas-Rachford Algorithm [17] following the formulation of 
Lions and Mercier in the context of monotone operator equations [18]. In both 
cases, the desired point is a fixed point of the mapping T, that is a point 7) such 
that Y = Tw where either T = } (Re (Rum + (dr — 1)Pu) + 1d +(1 — Pu) or 
T = }(Re(Ru + Id). 

Whether or not the iterations above converge to a fixed point, and what this point 
has to do with the problem at hand is the subject of Chap. 23. For the purposes of 
this tutorial, the algorithm will simply be run with the given mappings and the user 
will be left to interpret the result. A few tools are provided within the ProxToolbox 
to monitor the iterates according to mathematical, as opposed to physical, criteria. 
For the beginning reader all that is important to keep in mind is that, first of all, 
the algorithms don’t always converge, and second of all, when they converge, the 
limit point is nor a solution to the problem you thought you were solving, but you 
can usually get there easily from the limiting fixed point. Another issue to keep 
in mind is that the physical criteria that scientists apply to judge the quality of a 
computed solution usually does not correspond to the mathematical criteria used 
to characterize and quantify convergence of an algorithm. It is not uncommon to 
see pictures of “solutions” returned by various algorithms at iteration k, or to see a 
comparison of a root mean-squared error estimate of an iterate of various algorithms. 
This is, from a mathematical perspective, not really meaningful for several reasons. 
The first reason is that, unless the underlying optimization problem is to minimize the 
root mean-squared error of something, there is no reason to expect that an algorithm 
should do this. The second reason is that, as already mentioned, the iterates of some 
algorithms, like Douglas-Rachford, are not the points that approximate solutions to 
the desired optimization problem, but their shadows, defined as the projection of 
these points onto a relevant set, are. Comparison of the quality of solutions returned 
by algorithms is common, but it should be recognized that such comparisons are not 
mathematical, but rather phenomenological, if of any scientific significance at all. 
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The identification of the HIO algorithms with Douglas-Rachford, at least for one 
parameter value, made a lot of sense when it was first discovered. The HIO algorithm 
is famously unstable. The way most people use it today is to get themselves in the 
neighborhood of a good solution by running 10-40 iterations of HIO, at which point 
they switch to a more stable algorithm to clean up their images. The value of HIO or 
Douglas-Rachford is that they rarely get stuck in local minimums. The identification 
with Douglas-Rachford makes this phenomenon clear since it can be proved that, 
if the sets G and M do not intersect, then Douglas-Rachford does not possess fixed 
points. If M were convex, then you could even prove that the iterates must diverge to 
infinity in the direction of the gap vector between best approximation pairs between 
the sets [19]. For nonconvex problems like noncrystallographic phase retrieval, the 
iterates need not diverge, but they cannot converge. In Chap. 23 the convergence 
theory is discussed in some detail. 

As should be clear by now, there is no equation being solved here, but rather some 
point is sought, any point, that satisfies an equation and any other kind of requirement 
one might like to add. It is high time to bring the main character of this story to the 
stage. In the most general format (ptychography) this has the form 


General Problem 


Find (z,d,u)e MN (X x Ox M). 


This is a feasibility problem and what algorithms like (6.12) and (6.15) aim to 
solve, if possible. For the moment, it is easiest to examine the more elementary phase 
retrieval problem (ptychography with one measurement): 


Find y € GN M. (6.16) 


The sets G and M have the explicit characterizations 


M=[vec 


(Orp = VE, i= Dang (6.17) 


and 
6 = [p e Œ | y; =0ifi ¢ D}. (6.18) 


The projectors onto these sets are given by (6.11) and (6.14). When the requirements 
become so narrow that no point can satisfy all of them, the problem is said to be 
inconsistent. The reason for the instability of HIO and Douglas-Rachford lies with 
the failure of the sets G and M to have points in common. This indicates a fundamental 
inconsistency of the physical model. 
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The behavior of these algorithms in the presence of model inconsistency tends 
to be mistaken for another bugbear of inverse problems, namely nonuniqueness. For 
example, around the turn of the millennium, oversampling in the image domain was 
proposed to overcome the nonuniqueness of phase reconstructions for noncrystallo- 
graphic observations [20]. A few moments reflection on elementary Fourier analysis, 
and careful reading of Chap.? is all you need to convince yourself, however, that 
oversampling has less to do with uniqueness than with inconsistency. Increased, 
but still finite, sampling in the image domain just pushes the inconsistency, or gap, 
between the sets G and M to some level below either your numerical or experimental 
precision. It might look like the iteration has converged, but what has really happened 
is that the movement of the iterates has become so small that it is no longer detectable 
with a fixed arithmetic precision. This is just a physical manifestation of the fact from 
Fourier analysis that objects with compact support do not have compactly supported 
Fourier transforms, and vice versa. Since the measurements are finite, the object that 
is recovered cannot be finite. This means that the only time phase retrieval can be 
consistent is when imaging periodic crystals. 

To be sure, uniqueness is nice when you have it, but you first need to clear up 
the issue of uniqueness of what. No wave w with compact support can generate the 
given intensity I. When the problem is inconsistent, it suffices to find some point, 
any point, that comes as close to both sets as possible. This is a best approximation 
problem and takes the form of the usual optimization problem: 


atc A re 
Zn AN dist (Y, 6) 


subject to weM 


(6.19) 


The reason for minimizing the distance squared instead of the distance is to have 
a nice smooth objective function—it doesn’t really change the problem. Nor does 
the factor aio out front, but it has a huge impact on the next algorithm, which 
solves (6.19). So the question of uniqueness amounts to whether problem (6.19) 
has a unique solution. Experts in optimization don’t often worry about uniqueness, 
but rather the existence of local minima to (6.19). This is one of the few remaining 
unresolved mathematical issues in phase retrieval. 

While, in most applications, the projections onto the sets X, O and M in (6.7) 
have a closed form and can be computed very accurately and efficiently, there does 
not seem to be any method, analytic or otherwise, for computing the projection onto 
the set M defined by (6.6). This might be another good reason for avoiding a fea- 
sibility model. Indeed, if the projections are too difficult, or impossible to compute 
analytically, then the large part of the advantage of projection methods evaporates. 
Nevertheless, this essentially two-set feasibility model suggests a wide range of 
techniques within the family of projection methods, alternating projections, aver- 
aged projections and Douglas—Rachford being representative members. In contrast 
to these, methods based on optimization models can avoid the difficulty of computing 
a projection onto the set M by instead minimizing a nonnegative coupling function 
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that takes the value 0 (only) on M. The model for this is a constrained least squares 
minimization model 


Find (z. Y, i) € argmin {F (z, y, u) | (z,v,u)eXxYxM} (6.20) 


where 


F(z, y, u) = } 11S; JO%-u;l?. (6.21) 


j=! 


What has happened here is that the set M defined by (6.6) has been replaced by 
the least squares objective to avoid the complication of computing the projection 
onto M. 

The relaxed averaged alternating reflections algorithm (RAAR?) first proposed in 
[21] addresses the instability of the Douglas-Rachford algorithm by anchoring the 
usual Douglas-Rachford iterates to one of the sets: 


pt! e (2(Re(Ru + 1d) + (1 -NPu) a). (6.22) 


When X € [0, 1) it was shown in [22] that this algorithm is equivalent to the Douglas- 
Rachford algorithm applied to (6.19). For the moment, just recognise that this is a 
convex combination of (6.15) with the projection onto the set M. If one wanted to 
play around further, the constraints G and M can be changed without changing the 
form of the fixed point iterations. For instance, if the thing one is trying to recover, Y, 
is actually an electron density, it should be a real-valued, positive vector; so instead 
of the set G, one should restrict the possible points to 


vi = (6.23) 


else 


s= [ver 


co Rab} ifi e D | 
; 


The RAAR algorithm then takes the form 
VE (3(Re, (Ru + Id) + (1 — )Pu) ), 
which is hardly a change from (6.22). In fact, mathematically there is no qualitative 


difference between the two. When translated back to the format of the original HIO 
algorithm, this takes the form [21, Prop.2.1]*: (vkeN)(fori=1,2,...,n), 


3The names for these algorithms have evolved since their first introduction. In [19] the procedure that 
is today known as the Douglas-Rachford algorithm was called the Averaged Alternating Reflections 
algorithm, which then explains the genesis of the name RAAR for (6.22). Since Douglas-Rachford 
is more or less the accepted name for (6.15), (6.22) is called DRA in more recent matheamtical 
articles. Nevertheless, RAAR is more common in the physics literature, so that is the nomenclature 
used here. In the ProxToolbox, however, the DRA nomenclature is used. 


4(6.24) corrects a sign error in the lower half of [21, Eq (14)] 
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yt! = (Puy). ifi € Dand (Ry (*)), = 0; (624) 

7 (AYE + = 2)(Pu)),, otherwise. 
Comparing (6.24) with (6.9), it is clear that these are very different algorithms. If 
the object domain constraint were to change, (for instance, to a magnitude constraint 
in the object plane for a complex-valued object) the physical description analogous 
to (6.24) would change dramatically yet again, but the description as a fixed point 
mapping would always have the form 


Ww! € Traart = (4(Ro(Ru + ld) + (1 — A)Pm) a) (6.25) 


where O is a placeholder for the constraint in the physical domain (see also (2.89)). 
The main point here is that the mathematical properties of the fixed point mapping 
Traar depend on the properties of the sets M and O, but the algorithm is always the 
same. 

From this point hence, the word algorithm will be used more or less synonymously 
with the phrase fixed point iteration. This will be a convenient way to pack several 
(hundreds of) lines of code into a single symbol T, for the fixed point mapping. This 
T takes a guess x? and replaces it with an update x'. In mathematical terms, T maps 
x? € X tox! © X where X is the domain and image spaces of T, the shorthand for 
which is T : X — X. The domain and image spaces need not be the same, but for 
fixed point iterations they are. One important feature of this way of thinking about 
things is that the guess and the update are the same kinds of objects with the same 
physical interpretation. This is different than a function or more generally a relation 
which can map a point to anything, for instance a number, a color, or a set. A fixed 
point iteration is the process of repeatedly applying the fixed point mapping T: given 
x°, generate a sequence of points x* via 


(WkKEN) xl = Tx. (6.26) 


There are a number of accessories one can add to (6.26). These take the form: 
given x°, and an update rule A; (k = 1,2, ...), generate a sequence of points x* via 


(WkeN) yl = Txt, 
ge Apa). (6.27) 


The main difference between (6.26) and (6.27) is that in the latter the operations 
from one iteration to the next can evolve and adjust along with the iterations. These 
are invariably called accelerations because that is the name of the game. 
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6.1.3 What Isa Proximal Method? 


A proximal method is a fixed point iteration (6.26) or its acceleration (6.27) where 
the fixed point mapping T consists of proximal mappings. A proximal mapping has 
a specific mathematical definition, but before giving this an intuitive description will 
probably be more helpful. A proximal mapping sends a point to one or more points 
that strike a balance between solving a minimization problem and staying close to 
the original point. The projectors of the previous section are proximal mappings. To 
see this, consider the function 


0 ifu € Q 
= : i 6.28 
ta (u) +00, else. ( ) 


where £2 is some set. Allowing this function to take the value +00 is very convenient. 
The optimization problem corresponding to minimizing to while staying as close as 
possible to u is 

minimize tọ (u) + Hu - all’, 


and the solution to this problem is written 
A 1 a2 
argmin , {to (U) + zu llu — @l?}- 


The parameter A > 0 will become important in a moment, but it has no significance 
in this context since the solution to the optimization problem above is the same for 
all positive values of A. The solution to this problem is the set of points in 2 that 
are nearest to u, or the set Pou. This should not be confused with the optimal value 
of this problem, which in this context is just the distance of the point u to the set 
Q. For practical purposes one simply takes a selection from the set; this is denoted 
ut € Pou and u* is called a projection. 

The function ¿o is not the only function one could use, hence the more general 
use of the terminology proximal mapping for the general function f [23] 


prox ¢,(@) = argmin „ { f (u) + Alu —al’}. (6.29) 


Here the value of X plays the role of dialing up or down the requirement of staying 
close to the point u. This is often understood as a step-length parameter in the context 
of algorithms: the smaller X is, the greater the penalty for moving away from u. 
The algorithm (6.15) written using the formalism of proximal mappings takes the 
form 
Y = i (RaroRpa + 1d) WH). (6.30) 


Here R fA, and 7 ¢,,,, are called proximal reflectors defined by 


R foro = 2prox yy, —Id and Rf, =2prox,, y, —Id. (6.31) 
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This is quite liberating because now, the same basic fixed point mapping can be 
applied without any changes in the mathematical theory to a much broader range of 
problem types. 

It is worthwhile spending a few moments to marvel at prox. This is a mapping 
from points in a space X to points in the same space; using mathematical notation 
prox sà : X — X . But, look again at (6.29): this is the solution to another optimiza- 
tion problem. There are two important things to notice about this observation, first 
of which is that a mapping has been created out of an optimization problem. This is 
what a mathematician might call pretty. The second thing to notice is that the value of 
the optimization problem—‘the answer to the ultimate question of life, the universe 
and everything” [24]—is beside the point. 


6.1.4 On Your Mark. Get Set... 


There are three groups of readers envisioned for this tutorial. The first group is 
students, of either physics or mathematics, wishing to get hands-on numerical expe- 
rience with classical algorithms for real-world problems in the physical sciences. 
The second group is optical scientists who already know what they want to do, but 
would like a repository of algorithms to see what works for their problem and what 
does not work. The third group is applied mathematicians who have new algorithmic 
ideas, but need to see how they perform in comparison to other known methods on 
real data. A stripped-down version of the ProxToolbox is used at the University of 
Göttingen to teach graduate and undergraduate courses in numerical optimization 
and mathematical imaging. What is omitted from the student version is the repos- 
itory of algorithms and some of the prox mappings— the students are expected to 
write these themselves, with some guidance. Experienced researchers, it is expected, 
will extract the parts of the toolbox they need and incorporate these into their own 
software. To make it easy to identify the pieces, the toolbox has been organized 
in a highly modular structure. The modularity comes at the cost of an admittedly 
labyrinthine structure, which is the hardest thing to master and the main goal of the 
rest of this tutorial. 


6.2 Algorithms 


The two different models discussed above, feasibility and constrained optimization 
(6.19), lead to a natural classification of categories of algorithms. The development 
presented here follows [25]. To underscore the fact that the algorithms can be applied 
to problems other than X-ray imaging, the sets involved are denoted by 92; for 
j =0,1,2,...,m and the points of interest are denoted with a u, instead of the 


5In case you forgot, it’s 42. 
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context-specific notation for a wavefield 7. The sets 2; are subsets of the model 
space C” (or R”) and, since there can be more than just two sets, as in the case of 
phase diversity or ptychography the integer m is just a stand-in for the number of 
images and other qualitative constraints involved in an experiment. 


6.2.1 Model Category I: Multi-set Feasibility 


The multi-set feasibility problem is: 


Feasibility 


Find u € (V9 Q). (6.32) 


The numerical experience is that this model format leads to the most effective 
methods for solving phase-type problems. It is important to keep in mind, however, 
that for all practical purposes the intersection above is empty, so the algorithm is not 
really solving the problem since it has no solution. 

Feasibility problems can be conveniently reformulated in an optimization format: 


min ) to, (U), (6.33) 
=0 


ueC" 4 
J= 


where 12, is the indicator function (6.28) of the set 2;. The fact that the intersection 
is empty is reflected in the fact that the optimal value to problem (6.33) is +00. 
For the purposes of this tutorial anything bigger than, say, 42 will be approximately 
infinity. 

The easiest iterative procedure of all is the Cyclic Projections algorithm 


Algorithm 6.2.1 
Initialization. Choose u? € (C"). 
General Step (k = 0, 1, ...) 


utt! € Tep(u*) where Tcr = PaPa: Po 


m* 
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In the context of phase retrieval with one observation and an object domain con- 
straint this is called the Gerchberg-Saxton algorithm [26]. An early champion of pro- 
jection methods for convex feasibility was Censor [27] who together with Cegielski 
has written a nice review of the extensive literature on these methods [28]. A more 
recent review of inconsistent feasibility can be found in [29]. The most complete 
analysis of this algorithm for consistent and inconsistent nonconvex problems has 
been established in [30] and is reviewed in Chap. 23. In the inconsistent case the 
fixed points generate cycles of smallest length locally over all other possible cycles 
generated by projecting onto the sets in the same order. Rates of convergence have 
been established generically for problems with this structure (see [30, Example 3.6]). 
Rates are important for estimating how far a particular iterate is from the solution. 
The most elementary convergence rate is linear convergence, also known as geomet- 
ric or exponential convergence in various communities. A sequence (u*),-0,1,2.... of 
points u* is said to converge linearly (technically, Q-linearly) to a limit point u, with 
a global rate c < 1 whenever 


ut — vy || < cllu® —uyl| Vk =0,1,2,.... 


The Douglas-Rachford iteration given by (6.15) can only be applied directly to 
two-set feasibility problems, 


Find x € 29NQ,. 


The fixed point iteration is given by 
1 
uk*! € Tpru* where Tpr = 5 (RaRa +1d), (6.34) 


for Re, and Ro, are generic set reflectors (see (6.13)). It is important not to forget 
that, even if the feasibility problem is consistent, the fixed points of the Douglas- 
Rachford Algorithm will not in general be points of intersection. Instead, the shadows 
of the iterates defined as Pa, (už), k =0,1,2,..., converge to intersection points, 
when these exist [19]. 

To extend this to more than two sets, Borwein and Tam [31, 32] proposed the 
following variant: 


Algorithm 6.2.2 (Cyclic Douglas-Rachford—CDR) 
Initialization. Choose u? € C”. 
General Step (k = 0, 1,...) 


rn 


where 
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1 1 
Tepr = G (RaRa + 1) G (RaRo, + 1) == 


1 
G (Ron Rat 1a) : 


Different sequencing strategies than the one presented above are possible. In [33] 
one of the pair of sets is held fixed. This has some theoretical advantages in a convex 
setting, though no advantage was observed for phase retrieval. 

The relaxed Douglas-Rachford algorithm(6.25) takes the general form: for A € 
(0, 1] 


(DRA/RAAR) oe (3 (Ra Re, + Id) + (0-A) Pa.) (ut). (6.35) 


Extending this to more than two sets yields the following algorithm, which was first 
proposed in [25], where it is called CDRA. 


Algorithm 6.2.3 (Cyclic Relaxed Douglas-Rachford CDRA) 
Initialization. Choose u? € C” and A € [0, 1]. 
General Step (k = 0, 1, ...) 


ow! e Toon 


where 


A 
Tepr = (5 (RaRa +ld)+(1-A) Pa, 3 
A 
(5 (Ra, Ra, + Id) 4b (UL = 2) Pa.) 


A 
Ži (5 (RoRo a dy) Pa.) : 


The analysis for RAAR and its precursor, Douglas-Rachford is contained in [30, 
Sect. 3.2.2] and [34]. 

Another popular algorithm that can be derived from the Douglas-Rachford 
algorithm in the convex setting [35] is the Alternating Directions Method of Mul- 
tipliers (ADMM, [36]). For nonconvex problems like phase retrieval the direct link 
between these methods is lost, though there have been some recent developments 
and studies [37-39]. The ADMM algorithm falls into the category of augmented 
Lagrangian-based methods. Here, problem (6.33) is reformulated as 
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‚min, eae (u;) |u; =x, j=1,2,...,m $, (6.36) 


so that one can apply ADMM to the augmented Lagrangian given by 


m 


Ly (x, uj, vj) = le, (x) + X (va, (uj) + (vj, x— uj) + le — u;l?), (6.37) 


j=l 


where 7) > 0 is a penalization parameter and v;, j = 1,2, ..., m, are the multipliers 
which are associated with the linear constraints. The ADMM algorithm applied to 
finding the critical points of the corresponding augmented Lagrangian (see (6.37)) 
is given by 


Algorithm 6.2.4 (Nonsmooth ADMM,) 
Initialization. Choose x°, u‘, v? € C” and fix n > 0. 
General Step (k = 0, 1, ...) 


I. Update 


m 
: n : 
xt e argmin ‚cr { Lo (x) + ) ((v. x — u*) + „ix = us?) 
j=! 


1 m i 1 k 
= Po = (4-04) (6.38) 


2. Forallj =1,2,...,m update (in parallel) 


k+l A k k+l Us] 2 
u; € argmin „eq {0 (uj) + (v4, xttl yi) + zl —ujl | 


= Po, (x*t' — mi). (6.39) 
3. Forall j = 1,2, ...,m update (in parallel) 


vit = vk +n (att ut). (6.40) 


This can be written as a fixed point iteration on triplets (x*, ut, vt) € C” x C™ x 
C”, but it is not very convenient to see things this way. Note that the projections in 
Step 2 of the algorithm can be computed in parallel, while the Cyclic Projections 
and Cyclic Douglas-Rachford Algorithms must be executed sequentially. Note also 
that the update of the block u**! incorporates the newest information from the block 
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x**+! together with the old data v*, while the update of the block v**! incorporates 
the newest information from both blocks x**! and u**!. This is in the same spirit 
as the Gauss-Seidel method for systems of linear equations. Obviously, there is an 
increase (by a factor of 3 + m) of the number of variables, but this is a mild increase in 
complexity in comparison to some recent proposals for phase retrieval which involve 
squaring the number of variables! Indeed ADMM is starting point for just about all 
the most successful methods for large-scale optimization with linear constraints (see, 
for instance, [40] and references therein). 

An ADMM scheme for phase retrieval has appeared in [41]. This is a terrible 
algorithm for phase retrieval. It is included here, however, as a point of reference to 
the Douglas-Rachford Algorithm. 


6.2.2 Model Category II: Product Space Formulations 


The second category of algorithms is a stepping stone to smooth optimization models, 
though this is not the most obvious way to motivate the strategy—the connection to 
smoothing only becomes apparent after some consideration. The idea is to lift the 
problem to the product space (C”)”*! which can be then formulated as a two-set 
feasibility problem 

Find u® € 2NZ, 


where u* = (us, Un, ur)» 2:= Ro X Qı X +++ X Rm andZ is the diagonal set 
of C”"+) which is defined by {u = (u, u,..., u): u € C"}. This also involves an 
increase in the number of unknowns, but only by a factor of m which, while not 
insignificant when m is large, can be managed through clever implementation. There 
are two important features of this formulation. First of these is that the projection 


onto the set §2 can be easily computed since 


Pau = (Posto, Pauıs..., Po Um) ; 


m 


where Po,, j = 1,2,...,m, are given in (6.11). The second important feature is 
that Z is a subspace which also has simple projection given by Pz (u) = u where 


m 


This formulation immediately suggests the Cyclic Projections algorithm 6.2.1, 
which, in the case of just two sets, is often called Alternating Projections 
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Algorithm 6.2.5 (Alternating Projections—AP) 
Initialization. Choose u? € C""+D, 
General Step (k = 0, 1, ...) 


utt! e PrPou. 


But Algorithm 6.2.5 is equivalent to 


in other words, the Alternating Projections algorithm on the product space is equiva- 
lent to the Averaged Projections algorithm 6.2.10 and the Alternating Minimization 
Algorithm 6.2.11. Also the popular Projected Gradient method reduces to averaged 
projections. To see this, consider the following minimization problem: 


1 
minimize = dist? (u, T) . (6.41) 
ue 2 


The objective above is convex and, because Z is a closed and convex set, continu- 
ously differentiable with a Lipschitz continuous gradient given by V dist” (u, Z) = 
2 (u — Pru) (see (6.46)). The Projected Gradient Algorithm applied to this problem 
follows easily. 


Algorithm 6.2.6 (Projected Gradient—PG) 
Initialization. Choose u? € CH"+D, 
General Step (k = 0, 1, ...) 


ut! € Po (uk — 3Vdistz(u)) > ut! € Po Pzut 


sy bye ea 
0 


This is not surprising since the minimization problem (6.41) is equivalent to (6.48). 
To see this, note that by the definition of the distance function 
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caer sess sys th bese “ 
min — dist” (u, Z) = min min —||u — y|? = min min = È`  |y - u; |? 
ue? 2 ueQ yeZ 2 ueh? yeC" 2 4 a 
J= 
m 
2 . 
= min zul lu; 2, j=0,1,...,m 


yu 
j=0 


In the convex setting, the Projected Gradient Algorithm has the advantage that it can 
be accelerated [42, 43]. A Fast Projected Gradient Algorithm for problem (6.41) 
looks like: 


Algorithm 6.2.7 (Fast Projected Gradient—FPG) 
Initialization. Choose u®, y! € C""+) anda, = > forallk =0,1,2,.... 
General Step (k = 1, 2,...) 


u € Poly‘ — 4V dis? (y*,Z)), 


y! = win, (u = u) 
> 
k k 
u = PoPry a 
y u (u u ut") 
There is no theory for the choice of acceleration parameter œg, k = 0, 1,2,...,in 


Algorithm 6.2.7 for nonconvex problems, but numerical experience [25, 44] indicates 
that this works pretty well. All that is missing is an explanation. 
In the product space setting the best approximation problem takes the form 


minimize 


A 
zo dist? (u, 7 6.42 
ueCnmsD E Io ae «| (6.42) 


Since there are only two functions, the proximal Douglas-Rachford or the relaxed 
proximal Douglas-Rachford algorithms apply to (6.42) without any tricks: 


Algorithm 6.2.8 (Relaxed Douglas-Rachford—DRA/RAAR) 
Initialization. Choose u? € C""+D and A € [0,1]. 
General Step (k = 0, 1, ...) 


A 
ut e 3 (ReReu + u) + (1 — A) Pout. (6.43) 


The relaxed Douglas-Rachford Algorithm is exactly the proximal Douglas- 
Rachford algorithm applied to the problem (6.42) [22]; that is, 
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1 A 
5 (RiRe + Id) = 7 (RrRe +1d) + (1 — A) Poa, 


where Rı = 2 prox, s (u) — u is the proximal reflector of the function fy (u) = 
TEE dist? (u, T). 

A different kind of relaxation to the Douglas-Rachford algorithm was recently 
proposed and studied in [45]. This appears to be better than Algorithm 6.2.8. When the 
sets involved are affine, the algorithm is a convex combination of Douglas-Rachford 


and Alternating Projections, but generally it takes the form 


Algorithm 6.2.9 (Douglas-Rachford-Alternating-Projections) 
Initialization. Choose u? € C""+D and A € [0,1]. 
General Step (k = 0, 1,...) 


ukt! e Pr ((1+ A) Pout — Au‘) — À (Pou* — uf). (6.44) 


This algorithm is denoted by DRAP in the demonstrations below. 


6.2.3 Model Category III: Smooth Nonconvex Optimization 


The next algorithm, Averaged Projections, could be motivated purely from the feasi- 
bility framework detailed above. But there is a more significant smooth interpretation 
of this model, which motivates the smooth model class. 


Algorithm 6.2.10 (Averaged Projections—AvP) 
Initialization. Choose u? € C”. 
General Step (k = 0, 1,...) 


uE nou where TiypS — ) Po: 
m 


The analysis of averaged projections for problems with this structure is covered 
by the analysis of nonlinear/nonconvex gradient descent. This is classical and can 
be found throughout the literature, but it is limited to guarantees of convergence to 
critical points [46, 47]. For phase retrieval it is not known how to guarantee that all 
critical points are global minimums, though this is a topic of intense interest at the 
moment. 

Although, in general, averaged projections has a slower convergence rate than its 
sequential counterpart [48], there are two features that recommend this method. First, 
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it can be run in parallel. Secondly, it appears to be more robust to problem incon- 
sistency. Indeed, Averaged Projections algorithm is equivalent to gradient-based 
schemes when applied to an adequate smooth and nonconvex objective function. 
This well-known fact goes back to [49] when the sets 2,, j =0,1,...,m, are 
closed and convex. In particular, two very prevalent schemes are in fact equivalent 
to AvP. 

To see this, consider the problem of minimizing the sum of squared distances to 
the sets 2;, j = 0, 1,...,m, that is, 


m 


1 
minimize f (u) = as 2 dist” (u, 2;) . (6.45) 


Since the sets 2;, j = 0, 1,...,m, are nonconvex, the functions dist” (u, 2;) are 
clearly not differentiable, and hence, same for the objective function f (u). However, 
in the context of phase retrieval, the sets 2;, j = 0, 1,..., m, are prox-regular (i.e. 
the projector onto these sets is single-valued near the sets [50]). From elementary 
properties of prox-regular sets [51] it can be shown that the gradient of the squared 
distance is defined and differentiable with Lipschitz continuous derivative (that is, 
the corresponding Hessian) up to the boundary of 2;, j = 0, 1,...,m, and points 
where the coordinate elements of the vector u vanish. Indeed, for f given by (6.45) 


m 


1 
le X (Id —Pa,) (u). (6.46) 
=0 


J 


Thus, applying the gradient descent with unit stepsize to problem (6.45), one imme- 
diately recovers the avareged projection algorithm. 

The objective in (6.45) is as nice as one could hope for: it has full domain, is smooth 
and nonegative and has the value zero at points of intersection. These kinds of models 
are for obvious reasons favored in applications; unfortunately, these reasons are a 
little old fashioned considering today’s mathematical technology for dealing with 
nonsmooth objectives like (6.33). 

Another way to approach problem (6.45) underscores connections with another 
fundamental algorithmic strategy. Consider the following problem: 


f 1 m f 
min f (u) = 2 (u, 2;). (6.47) 


Using the definition of the function dist? (-, 2;), j = 0, 1, ...,m, problem (6.47) is 
equivalent to 


; m 1 
min zul? [uj € 2, j=0,1,...,m}$, (6.48) 
j=0 
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where u = (uo, u1, ..., Um) € (C*)"*!, The number of variables has now increased 
(m + 1)-fold, which, for applications like ptychography starts to get worrying since 
m can be large. But this is more a conceptual issue than practical. 

Problem (6.48) always has an optimal solution (the objective is continuous and 
the constraint is closed and bounded set, so by a theorem from Weierstrass the 
minimum is attained). The optimization problem (6.48) consists of constraint sets 
which are separable over the variables uj, j = 0, 1,..., m; this can be exploited to 
divide the optimization problem into a sequence of easier subproblems. Alternating 
Minimization (AM) does just this, and involves updating each variable sequentially: 


Algorithm 6.2.11 (Alternating Minimization—AM) 


Initialization. Choose GP: ub, un, DE uo) ee 
General Step (k = 0, 1, ...) 
1. Update 
; m 1 m 
y= argmin ec >> lly = ul? = —— ay (6.49) 
= 5 2 =0 


2. Forall j = 0,1,...,m update (in parallel) 


: 1 
ui una „u; - y+? = Pay i 620) 


By combining (6.49) and (6.50), the algorithm is written compactly as 


1 m 
k+1 k 
€ ) P ; 6.51 
y noel Z 2,7 ( ) 


which is just averaged projections, Algorithm 6.2.10! When m = 1, i.e., only one 
image is considered, the Alternating Minimization Algorithm above coincides with 
what is known as the Error Reduction Algorithm [52] in the optics community. 

In [53] Marchesini studied an augmented Lagrangian approach to solving 


a (I(2;@),| - vB) - (6.52) 


pec 2n 
j=0 i=1 


This is a nonsmooth least-squares relaxation of problem (6.2). Generic linear least 
squares problems take the form 


nin 3, DD | (Fi), | - bu)” (6.53) 
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where F; is a linear mapping from C” to C” and b;; are positive scalars. When the 
sets 2; are defined by 

Qj = fu e © | |(Fjw)),| = by} (6.54) 
and the variables y € C” and u = (u1, u2... , Um) € C™” with uj € C” satisfy y = 


uj foreach j = 1, 2, ...,m, the resulting primal-dual/ADMM Algorithm takes the 
form: 


Algorithm 6.2.12 (AvP?) 

Initialization. Choose any x? € C” and p; > 0, j =0,1,...,m. Compute 
u} € Po, (y°) (j = 0,1, ..., m) and y' = (1/ (m + 1)) X; ou. 
General Step. Foreachk = 1, 2, ... generate the sequence {(y*, u‘)} 
as follows: 


KUN ee 


e Compute 
= 1 2 
yt = _ (1 +—0* -y )) (6.55) 
= Pj 
e For each j = 1,2,...,m, compute 
1 
i =Po (1 en (2y* - s=) (6.56) 
j 


This algorithm can be viewed as a smoothed/relaxed version of Algorithm 6.2.4, but, 
when you look at it for the first time, the most obvious thing that jumps out at you 
is that this is averaged projections with a tw-step recursion. This is why it has been 
called AvP? in [25]. 

The more general PHeBIE Algorithm 6.2.13 applied to the problem of blind 
ptychography [54] reduces to Averaged Projections Algorithm for phase retrieval 
when the illuminating field is known. To derive this method, note that for any 
fixed y and u, the function u +> F (z, y, u) given by (6.21) is continuously dif- 
ferentiable and its partial gradient, V,F (z, y, u), is Lipschitz continuous with mod- 
uli L, (y, u). The same assumption holds for the function y > F (z, y, u) when z 
and u are fixed. In this case, the Lipschitz moduli is denoted by L, (z, u). Define 
L’, (y, u) = max {L; (y, u) , nz} where 7, is an arbitrary positive number. Similarly 
define Z/, (z, u) = max IL y, u), ny} where 1), is an arbitrary positive number. The 
constant 77, and ny are used to address the following issue: if the Lipschitz constants 
L, (y, u) and/or L, (z, u) are zero then one should replace them with positive num- 
bers (for the sake of well-definedness of the algorithm). In practice, it is better to 
chose them to be small numbers but for the analysis it can be chosen arbitrarily. 
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Algorithm 6.2.13 (Proximal Heterogeneous Block Implicit-Explicit) 
Initialization. Choose a, 3 > 1, y > 0 and (z°, y°,u°) € X x O x M. 
General Step (k = 0, 1, ...) 


1. Setak = aL’, (vr uf) and select 


k 
f a 
z*t! € argmin -y fie =z VE (ae y u‘)) 4 zk z?t, 


(6.57) 
2. Set Bk = BL, (z**!, uf) and select 


Be 
y“! € argmin „co {( = NG te as a 
(6.58) 
3. Select 


ut! © argmin yem [F (a any Zu = we?) (6.59) 


Algorithm 6.2.13, referred to as PHeBIE in Sect. 6.3.3, can be interpreted as a 
combination of the algorithm proposed in [55] and a slight generalization of the 
PALM Algorithm [47]. In the context of blind ptychography (6.4), the block of vari- 
ables y is replaced with the object y and the function F is the least squares objective 
(6.21). A partially preconditioned version of PALM was studied in [56] for phase 
retrieval, with improved performance over PALM. The regularization parameters a* 


and BE, k =0,1,2,..., are discussed in [54]. These parameters are inversely pro- 
portional to the step size in Steps (6.57) and (6.58) of the algorithm. Noting that a, 
and 6k, k = 0,1,2,..., are directly proportional to the respective partial Lipschitz 


moduli, the larger the partial Lipschitz moduli the smaller the step size, and hence 
the slower the algorithm progresses. 

This brings to light an advantage of blocking strategies that is discussed in 
Chap. 12: algorithms that exploit block structures inherent in the objective function 
achieve better numerical performance by taking heterogeneous step sizes optimized 
for the separate blocks. There is, however, a price to be paid in the blocking strate- 
gies that are explored here: namely, they result in procedures that pass sequentially 
between operations on the blocks, and as such are not immediately parallelizable. The 
ptychography application is very generous in that it permits parallel computations 
on highly segmented blocks. 

Nonsmooth analysis can be applied to the objective in (6.53), but this is still not 
main stream enough to be the stuff of normal graduate training, so it remains rather 
exotic. A popular way around this, is to formulate (6.53) as a system of quadratic 
equations: 


|(F@),|? =}, Yi=12...,m, vi=12,...,n. (6.60) 
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The corresponding squared least squares residual of the quadratic model is plenty 


smooth: 
m n 


min G (w) = 5 DI. (IF) — 28) - (6.61) 


j=0 i=1 


There is a trick from conic programming that allows you to recast a quadratic 

equation on R” as a linear equation on the space of matrices, R”*”. The idea in the 
context of phase retrieval is called phase lift. The problem here is that, though the 
quadratic equation has been replaced by a linear equation (albeit in a much larger 
space), the desired solution is rank 1. Even though the set of fixed rank matrices is a 
manifold, it is not convex, so there is conservation of difficulty. The way around this 
is to replace the rank constraint with a norm—otherwise known as convex relaxation. 
The reasoning here is that, for convex problems, all local solutions are also global 
solutions to the problem; so you solve your convex problem and—poof— you have 
the global solution to the nonconvex problem under certain (hard to verify) condi- 
tions that guarantee the correspondence of the two problems. This is attractive as 
an analytical strategy, but as an algorithmic strategy it is not practical. Blumensath 
and Davies [57, 58] were the first ones to ask the question whether the conditions 
that guarantee correspondence of the nonconvex problem and its convex relaxation 
are also sufficient to guarantee that the nonconvex problem doesn’t have any critical 
points other than global minima. They answered this question in the affirmative for 
the projected gradient Algorithm 6.2.6 and Hesse, Luke and Neumann showed that 
this is also the case for alternating projections Algorithm 6.2.5 [59]. So, there is no 
need to resort to convex relaxations, which is good news indeed since the phase lift 
method is not implementable on standard consumer-grade architectures for any of 
the Gottingen data sets. 
Other methods based on the quartic objective have gained popularity in the newer 
generation of phase retrieval studies in the applied mathematics community. Notable 
among these are methods called Wirtinger flow. Smoothness makes the analysis 
easier, but the quartic objective has almost no curvature around critical points, which 
makes convergence of first order methods much slower than first order methods 
applied to nonsmooth objectives. See [14, Sect.5.2] for a discussion of this and [25] 
for numerical comparisons. 


Accelerations. The formulation of problem (6.45) has a fixed weight between the 
various distances. An extension of this is a dynamically weighted average between 
the projections to the sets 2;, j =0,1,...,m. This idea was proposed in [14] 
where it is called extended least squares. In the context of sensor localization a 
similar approach was also proposed in [60] where it is called Sequential Weighted 
Least Squares (SWLS). The underlying model in [14] is the negative log-likelihood 
measure of the sum of squared set distances: 


m 


minimize In (dist? (u, 2;) + c), >). (6.62) 
ete 
j=0 
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Gradient descent applied to this objective yields what was called the Dynamically 
Reweighted Averaged Projections Algorithm (DyRePr) in [25]. 


Algorithm 6.2.14 (Dynamically Reweighted Averaged Projections) 
Initialization. Choose u? € C” and c > 0. 
General Step (k = 0, 1,...) 


m 2 
Ta oe 
a a 2 (dist? (u, 2;) +) ( 


uk — Pou‘). (6.63) 


The smoothness of the sum of squared distances (almost everywhere) opens the 
door to higher-order techniques from nonlinear optimization that accelerate the basic 
gradient descent method. Quasi-Newton methods, for instance, would do the trick, 
and as observed in [61], they work unexpectedly well even on nonsmooth problems. 


Algorithm 6.2.15 (Limited Memory BFGS with Trust Region) 


1. (Initialization) Choose 7 > 0, ¢ > 0, te{l,2,...,n}, u? € C”, and set 
v = £=0. Compute V f (u?) and \|V f (u) || for 


1 m 1 m 
POS em (u,2)), Vf w= a 2) u). 


2. (L-BFGS step) For each k = 0, 1,2, ... if 2 = O compute u‘*! by some 
line search algorithm; otherwise compute 


=i 
s=-(M') Vf (u'), 
where M* is the L-BFGS update [62], u**! = u* + s*, f (u**'), and the 
predicted change (see, for instance [63]). 


3. (Trust Region) If p (s*) < 1, where 


actual change at step k 


p(s‘) 


~ predicted change at step k’ 


reduce the trust region A‘, solve the trust region subproblem for a new 
step s* [64], and return to the beginning of Step 2. If p (s*) > n compute 
utt! =a + s* and f (u*t'). 
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4. (Update) Compute V f (u**'), |V f (ut!) i, 
Javi (u+!) -vf (u*) odad a 
and sT yk, Ifs yk < Ç, discard the vector pair {s‘~©, y*=*} from storage, 


setl = max{é — 1,0}, At! = oo, pH! = p* and MM“! = M* (i.e. shrink 


kT k 
the memory and don’t update); otherwise set +! = n = : -~ and Akt! = oo, 


add the vector pair {s*, y*} to storage, if l = £, discard the vector pair 
{s*-*, y4} from storage. Update the Hessian approximation M el (62). 
Set € = min{é + 1, £}, v =v + 1 and return to Step 1. 


This looks complicated but is standard in nonlinear optimization. Convergence is 
still unexplained for the limited memory implementation. 


6.3 ProxToolbox—A Platform for Creative Hacking 


A platform for collecting and working with data should satisfy several objectives: 


data transfer 

sharing data processing algorithms 

comparing the performance of different algorithmic approaches 
teaching 

innovation. 


The ProxToolbox has been used within the Collaborative Research Center Nanoscale 
Photonic Imaging (SFB 755) at the University of Göttingen for each of the points 
above. It is written to be able to incorporate new problems, data, and algorithms 
without abandoning the old knowledge. This type of built-in knowledge retention 
requires a structure that is burdensome for single-purpose users. Most colleagues and 
students prefer to cannibalize the ProxToolbox—hacking is positively encouraged. 
This tutorial and the demos in the toolbox are intended to put the user on a fast track 
to successfully disassembling and re-purposing the basic elements. 

Our presentation of the toolbox here is without specific reference to commands 
and code to prevent this tutorial from being outdated within a few months. Certain 
aspects of the code will change as new applications and new features get added 
to the toolbox, but what will not change is the compartmentalization of various 
mathematically and computationally distinct tools. 

To download the toolbox and the data go to 


http://num.math.uni-goettingen.de/proxtoolbox/ 


Here you will find links to the Matlab and Python versions of the toolbox, along with 
documentation and literature. The toolbox has the following organizational structure: 
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Nanoscale Photonic Imaging demos 
Algorithms 

ProxOperators 

Drivers/Problems 


— Phase 
- Demos 
- DataProcessors 
- ProxOperators 
— Ptychography 
- Demos 
- DataProcessors 
- ProxOperators 


e Utilities 
e InputData 
e Documentation 


The Nanoscale Photonic Imaging demos folder. This folder contains scripts to gen- 
erate the figures shown in this tutorial. This is the rabbit you will follow down the 
hole. 


The Algorithms folder. This folder contains a general algorithm wrapper that loops 
through the iterations calling the desired algorithm. This is exactly T in (6.26), 
and the T, indicates which specific algorithm is run, from Algorithm 6.2.1 through 
Algorithm 6.2.9. After the specific fixed point operator is applied, a specialized 
iterate monitor is called. This will depend both on the problem and the algorithm 
being run. The default is a generic iterate monitor that merely checks the distance 
between successive iterates. By default, the stopping criterion for the fixed point 
iteration is when the step between successive iterates falls below a tolerance given 
by the user. But for some algorithms and some problems, this may not be the best 
or most informative data about the progress of the iteration. For instance, if the 
problem is a feasibility problem (6.32), then the feasibility iterate monitor not only 
computes the difference between successive iterates, but also the distance between 
sets (the gap) at a given iterate. In this context, a reasonable comparison between 
algorithms is not the step-size, but rather between the gap achieved by different 
algorithms. If one is running a Douglas-Rachford-type algorithm on a feasibility 
problem, then as explained above, the iterates themselves don’t have to converge, 
but their shadows, defined as the projection of the iterates onto one of the sets, will 
give a good indication of convergence of some form. Still other algorithms, like 
ADMM 6.2.12, generate several sequences of iterates (three in the case of ADMM), 
only two of which converge nicely when everything goes well [65]. As much as 
possible, the iterate monitoring is automated so that the user does not have to bother 
with this. But users who are interested in algorithm development will want to pay 
close attention to this. 
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The ProxOperators folder. Some prox operators are generic, like a projector onto the 
diagonal of a product space Z (otherwise known as averaging), or the prox of the £1- 
norm (soft thresholding), or the prox of the €9-function (hard thresholding). General 
prox operators are stored here. These always map an input to another point in the same 
space, but how they do this depends on the strucure of the input u (array, dimension, 
etc.). Some problems involve prox mappings that are specific to that problem, like 
Sudoku. These specific prox mappings are stored under the Problem/Drivers folder. 


The Drivers/Problems folder. This is the folder where the specific problem instances 
are stored. The problems that are of interest for this chapter are phase and ptychogra- 
phy, though there are other problems, like computed tomography, sensor localization 
and Sudoku. The Phase subfolder contains a general problem family handler called 
“phase”. Since all phase problems have similar features, this problem handler makes 
sure that all the inputs and outputs are processed in the same way. The toolbox works 
through input files stored in the demos subfolder. The input files contain names of 
data sets, data processors, algorithm names and parameters, and other user defined 
parameters like stopping tolerances, output choices and so forth. The input files might 
be augmented by a graphical user interface in the future. The link between the exper- 
imentalist and the mathematician is through the data processor. The data processor 
for the Gottingen data sets is easily identified and contains all the required parameter 
values for specific experiments conducted at the Institute for Physics in Göttingen. 
The data that the processor manipulates is not contained in the ProxToolbox release, 
but is stored separately and must be downloaded from the links provided on the Prox- 
Toolbox homepage. Prox operators specific to phase retrieval, such as the projection 
onto the intensity data (6.11), are also stored at this level. 

The InputData folder. The data, which is intended to be stored or linked in the 
directory “InputData”, is not included in the software toolbox in the interest of 
portability. This tutorial will only cover demonstrations with the Phase datasets and 
the Ptychography datasets. As these sets grow and develop, the links may change to 
reflect different hosts. 


The Utilities folder. This is where generic image and data manipulation tools are 
stored. 


6.3.1 Coffee Break 


The first walk through the Toolbox is demonstrated on an image set produced by 
undergraduates in Tim Salditt’s laboratory at the Institute for X-ray Physics at the 
University of Göttingen. The data is the CD/ intensity datafile contained in the Phase 
dataset linked to the ProxToolbox homepage. There is a demonstration of the Cyclic 
Projections Algorithm 6.2.1 in the folder Nanoscale Photonic Imaging demos. To 
run the demonstration, just type Coffee demo at the Matlab prompt (assuming Matlab 
has the demo folder and all the data folders in its path) or python Coffee demo.py 
at the shell prompt if you are working with Python. The data set presented here 
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(I;, in problem (6.2) with n = 128?) is a diffraction image of visible light shown in 
Fig. 6.1a (log intensity scale). The physical parameters of the image (magnification 
factor, Fresnel number, etc.) are not given, so the easiest thing to do is to assume 
a perfect imaging system and expert experimentalists. The imaging model is then 
just an unmodified Fourier transform, that is, far-field imaging. The object was a 
real, nonnegative obstacle, supported on some patch in the object plane, so that the 
qualitative constraint O is of the form (6.23). The only way to know that the algorithm 
is converging at least to a local best approximation point is to monitor the successive 
iterates and feasibility gap Fig.6.1b-c. A small feasibility gap is not necessarily 
desireable, since this also means that the noise is being faithfully recovered. For the 
demonstration shown here, a low-pass filter is applied to the data since almost all 
of the recoverable information about the object is contained in the low-frequency 
elements. It might seem counterintuitive, but the larger the feasibility gap (i.e., the 
more inconsistent the problem is), the faster the algorithm converges. In Chap. 23 
this is explained. The original object was a coffee cup which the generous reader can 
see if he tilts his head to the left and squints really hard (Fig. 6.1d right). When you 
run the demo, don’t be surprised if your reconstruction is an upside down version of 
what is shown here—this is a symptom of nonuniqueness of solutions to the phase 
problem. 


(d) 


Fig. 6.1 a Observation (log scale) from optical diffraction experiment. b Step-size and c gap size 
between constraint sets versus iteraton for several algorithms [25]. d Typical recovery from the 
algorithms 
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6.3.2 Star Power 


The next demonstration is of the reconstruction of a test object (the Siemens star) 
from near field X-ray data provided by Tim Salditt’s laboratory at the Institute for 
X-ray Physics at the University of Göttingen. Here the structured illumination shown 
in Fig. 6.2a left is modeled by D;, j = 1, 2,..., m, in problem (6.3) with m = 1. The 
image shown in Fig. 6.2a right is in the near field, so the mapping DF in problem (6.2) 
is the near-field Fresnel transform [66]. In the model (6.3) this image is represented 


by Ij, j = 1,2,...,m, with m = 1. The qualitative constraint is that the object is 
a pure phase object, that is, the field in the object domain has amplitude 1 at each 
pixel. 


A reconstruction with this data that does not take noise into account is shown 
in Fig.6.3. What is remarkable here is that if one only looks at the convergence 
of the algorithms and judges by the achieved gap before termination, it appears 
that the quasi-Newton accelerated average projections algorithm (QNAvpP) is clearly 
the best Fig. 6.3b-c. But when you look at the reconstructions Fig. 6.3a the QNAvP 
reconstruction is the worst. The problem here is that the noise has also been faithfully 
recovered. 

In [67] aregularization strategy is proposed that blows a ball around the data (either 
Euclidean or Kullback-Leibler, as appropriate) and takes any reasonable point within 
the ball. The justification is that, if the data is noisy anyway, you don’t want to match 
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Fig. 6.2 Near field X-ray holography experiment with a Siemens test object [25]. a The empty 
beam. b Observed pattern. c Initial guess for object amplitude. d Initial guess for object phase 
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Fig. 6.3 a Reconstruction of regularized near field holography experiment with empty beam cor- 
rection for the same data shown in Fig. 6.2. b Step-size and ¢ gap between the constraint sets versus 
iteration [25] 


it exactly. For a more precise development of this intuition see Chap. 4. Though it is 
not obvious, projecting onto such fattened data sets is more expensive than projecting 
onto the original noisy data. The latter is viewed as an approximate and extrapolated 
projection onto the fattened set. The algorithm is terminated when the iterates are 
within a tolerable distance of the data. A demonstration of this is shown in Fig. 6.4. 
The theoretical justification for this strategy is quite technical, but effectively what 
one is doing is running the old algorithms with early termination. 


6.3.3 E Pluribus Unum 


The demonstration Ptychography demo shown in Fig.6.5 computes the probe and 
object from a far-field raster scan of 676 overlapping patches of the Siemens star, 
illuminated by a narrow X-ray beam. The mathematical problem is to minimize 
the objective function given in (6.3). The demonstration shows how the PHeBIE 
algorithm 6.2.13 does this. Since blind deconvolution has many local solutions, the 
process has two phases: the first conventional phase retrieval on the data with a probe 
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Fig. 6.4 a Reconstruction of regularized (i.e. early termination) near field holography experiment 
with empty beam correction for the same data shown in Fig. 6.2a. b Step-size and ce gap between 
the constraint sets versus iteration 


ansatz, the second phase simultaneous phase retrieval and probe determination in 
what is essentially a nonlinear blind deconvolution. In the Prychography_demo the 
first phase is executed with the DRA algorithm on the product space: 676 images 
of size (192?) accounting for 26 equal translations of the beam side to side and 26 
equal shifts of the beam top to bottom. The second phase is executed with the PHeBIE 
algorithm starting from the last iterate of the first phase. 


6.4 Last Word 


One of the most rewarding things about participating in the Nanoscale Photonic 
Imaging Collaborative Research Center at the University of Göttingen has been 
working with scientists from different disciplines with different sensibilities and 
intuition. Collaboration starts with mutual respect and an openness for new ways of 
thinking about things. This has resulted in better mathematics and better science, 
both grounded in real world experience but with an attention to abstract structures. 
This has forced the examination of aspects of abstract models that, at first glance, 
don’t seem that important, but turn out to be decisive in practice. 
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Probe guess Object amp, start Object phase, start 
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Recon probe amplitude Recon probe (imaginary) 
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o, Change in iterates, Algorithm: PHeBIE 
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Fig. 6.5 a Initial probe and warm start object initialization of a far field ptychography experiment 
for the scan data. b Probe and object reconstructed by the PHeBIE algorithm 6.2.13. ¢ Step size 
and objective function values versus iteration 
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Progress and Perspectives 


Chapter 7 A) 
Quantifying Molecule Numbers geai 
in STED/RESOLFT Fluorescence 

Nanoscopy 


Jan Keller-Findeisen, Steffen J. Sahl and Stefan W. Hell 


Abstract Quantification of the numbers of molecules of interest in the spec- 
imen has emerged as a powerful capability of several fluorescence nanoscopy 
approaches. Carefully relating the measured signals from STED or RESOLFT scan- 
ning nanoscopy data to the contribution of a single molecule, reliable estimates of 
fluorescent molecule numbers can be obtained. To achieve this, higher-order signa- 
tures in the obtained photon statistics are analyzed, as arise from the antibunched 
nature of single-fluorophore emissions or in the signal variance among multiple 
on/off-switching cycles. In this chapter, we discuss the concepts and approaches 
demonstrated to date for counting molecules in STED/RESOLFT nanoscopy. 


7.1 Introduction 


Ideally, a microscope discerns and maps all features in the specimen. Until the dawn 
of the 21st century, it was generally accepted that an optical microscope using lenses 
would not be able to discern features on the nanoscale [1]. By the mid 1990s, how- 
ever, physical concepts for overcoming the longstanding diffraction barrier had been 
introduced, and the development of super-resolved far-field fluorescence microscopy 
(or ‘nanoscopy’) has since progressed tremendously. At a deep level, the reason for 
the vastly increased resolution capabilities of modern fluorescence nanoscopes—to 
resolve molecules with distances of even just a few nanometers—has been a major 
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paradigm shift: the discrimination of molecules is no longer realized by the focus- 
ing of the light in use. Rather, the molecules are transiently transferred to different 
states, usually fluorescence ‘on’ and fluorescence ‘off’ states, so that they are dis- 
tinguishable when using a (diffraction-limited) illumination pattern to probe their 
signals [2]. 

An important aspect for mapping the molecules is that the transient state change 
can occur in a spatially controlled (i.e., coordinate-targeted) or in a spatially stochas- 
tic manner. The first kind is realized in methods called stimulated emission deple- 
tion (STED) [3, 4], saturated structured-illumination microscopy (SSIM) [5] and 
reversible saturable/switchable optical fluorescence transitions (RESOLFT) [6, 7]. In 
these approaches, a pattern of light with one or multiple intensity minima switches the 
molecules optically between an ‘on’ and an ‘off’ state, thus transferring all molecules 
to one of these states except those located at or near the intensity minima. Scanning 
the pattern of light across the specimen ensures that every molecule ends up in a 
subdiffraction-sized region at least once, and hence is for that time in a different 
state from its resolved neighbors. 

The highest combined resolution along all three spatial dimensions has been 
achieved for STED or RESOLFT microscopes with two opposing lenses, where 
the illumination as well as the detected fluorescence light from both lenses can be 
combined in a coherent manner [8-10]. 

Spatially stochastic methods, among them photoactivated localization microscopy 
(PALM) [11] and stochastic optical reconstruction microscopy (STORM) [12] bring 
molecules in close proximity independently to an on-state in which the individual 
molecules can emit multiple and ideally a large number of fluorescence photons. The 
collected signal is then used to estimate the position of the isolated molecules with 
subdiffraction precision. Again, the use of two opposing lenses coherently combin- 
ing the fluorescence light has delivered very high and close-to-isotropic 3D resolu- 
tion [13]. 

A recent breakthrough development, MINFLUX, has attained the ultimate 
molecule-size, one-nanometer resolution scale at minimal fluxes of emitted fluores- 
cence photons [14]. MINFLUX operates with spatially stochastic single-molecule 
switching, but makes use of one or more coordinate-giving intensity minima of exci- 
tation light to make the controlled, known position of a minimum coincide with the 
molecule position and determine it very efficiently in terms of registered fluorescence 
photons. 

Although the spatially stochastic methods can provide molecular maps [15], 
counting molecules with stochastic methods is not as straightforward as it may 
appear. Molecules which do not emit sufficient numbers of photons while residing 
in the on-state to be detected, or which do not assume this state at all, are missed out 
completely. Other molecules might occupy the on-state repeatedly, and thus might be 
counted multiple times, thus requiring a careful and non-trivial calibration. For these 
reasons, fluorophores which assume the on-state only once would be favorable in 
principle, but such fluorophores would allow only a single super-resolution record- 
ing, meaning that the molecular counting would not be repeatable. As an additional 
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aspect to consider, counting molecules one by one requires an extended recording 
time in which the molecules must remain stationary. 

In contrast, STED and RESOLFT microscopy are not based on single-molecule 
detection and, by registering signals from all molecules from a given coordinate 
simultaneously, they provide a potential speed advantage. However, this very same 
fact has made the counting of molecules in ensemble-based nanoscopy imaging 
modes challenging. In most cases, the actual brightness of individual molecules 
in an experiment remained elusive, because the sample did not contain spatially 
sparse molecules found on their own. Furthermore, local environment heterogeneity 
can induce local variations of the molecular brightness. However, if the average 
contribution of a single molecule to the recorded image can be reliably estimated, 
the number of participating molecules can simply be deduced from the magnitude 
of the fluorescence signal. 

A reliable method to extract the numbers of molecules in STED or RESOLFT 
microscopy is very desirable, and substantial progress has been achieved towards 
this goal. Indeed, a careful analysis of the photon arrival statistics in STED and 
RESOLFT imaging, especially the study of (1) occurrences of simultaneous arrivals 
of fluorescence photons in STED as well as the (2) fluctuations in signal of repeated 
recordings at the same scan position in RESOLFT, reveals higher-order dependencies 
of the recorded photon statistics on the number of molecules and their brightness. 
Such a careful analysis allows to disentangle number and brightness and thus map 
the number of molecules in an image. The effects and statistical signatures harnessed 
are fully compatible with the subdiffraction resolution of STED and RESOLFT and 
can therefore readily be applied also in a live-cell imaging regime. 

In the following sections, these two relatively new quantitative nanoscopy methods 
will be presented, with an emphasis on the statistical modeling that goes beyond a 
purely ‘classical’ shot-noise description of photon statistics. 


7.1.1 Molecular Contribution Function (MCF) 


In analogy to the point spread function (PSF), which corresponds to the image of a 
theoretical point source, the molecular contribution function (MCF) can be defined as 
the quantitative spatial distribution of the average number of photons counted from a 
single fluorophore. With the knowledge of the MCF, all linear imaging systems where 
the recorded image is the sum of the contributions of all the individual molecules, 
counting molecules then becomes a conceptually straightforward and accessible task 
of normalizing the image by the MCF. For example, a space-invariant, linear imaging 
system expresses the measured image Y (x) at each scan position x as the convolution 
of the number density n(x) with the MCF(x) 


Y (x) = (n * MCF) (x) + €(x), (7.1) 
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where x denotes the convolution operator and ¢(x) describes the measurement noise, 
i.e., the deviation of a particular recorded image from the average (noise-free) image. 

Estimation of the unknown number density n(x) from the data then amounts to 
simply deconvolving the data with a carefully calibrated MCF. 

Counting the number of molecules in defined isolated regions, which in principle 
could be as small as the resolution scale, means a division of the summed image data 
in those regions by the average total signal of a single molecule, i.e., the integral over 
the MCF. 

Even though a considerable part of the theory in this chapter is presented using 
continuous functions mostly for the sake of simplicity of notation, it is understood 
that all recorded microscopic data is pixelated with a pixel size smaller than the spatial 
resolution. The transition between continuous and discrete data grids is straightfor- 
ward and may be realized implicitly wherever it is convenient. 

Note that every fluorescence microscopy technique with linear imaging conditions 
and independent and identically behaving fluorophores features an MCF, which lends 
itself to mapping the number of fluorophores. However, the MCF and the total signal 
of a single molecule will depend on the used optics, measurement properties and 
chosen fluorophores. The main task of quantitative STED/RESOLFT nanoscopy 
therefore lies in determining the average signal per fluorophore intrinsically from 
the dataset itself. 


7.2 STED Nanoscopy with Coincidence Photon Detection 


Measurements of the statistics of simultaneous photon arrivals in fluorescence 
microscopy have been shown to identify individual fluorophores [16, 17], to improve 
the resolution of fluorescence microscopy [18, 19] and have been used to analyze 
individual clusters of molecules distributed in space [20, 21]. In the next section, a full 
imaging model of simultaneous photon arrivals in confocal and STED microscopy 
is derived, and afterwards a live-cell imaging experiment will be described. 

In a scanning fluorescence microscope with a pulsed illumination light source 
(the preferred approach for photon coincidence measurements), the specimen at 
each recorded pixel position experiences a certain number of light pulses. If exci- 
tation of the molecules is accomplished with pulses with a duration much shorter 
than the fluorescence lifetime then each molecule can be excited at most once per 
pulse. The probability of a molecule to be excited and to yield a detected photon 
as the focal center of the scanning and the coordinate of the molecule coincide is 
named the molecular brightness A. Reflecting changes in local environment, it may 
change with the location of a molecule, but here it is assumed that all molecules at 
a given location (within a resolved region) also share the same brightness and that 
A(x) remains constant over time. The optimal case of A = 1 would be reached if 
the molecule contributed with a detected photon for every pulse. In practice, strong 
excitation leads to increased photo-bleaching and a widening of the spatial region in 
which fluorophores are in the saturated regime (broadening of the excitation spot), 
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which must be avoided. The quantum yield of the fluorophores is limited and the 
detection optics cannot collect and detect every emitted photon, resulting in a molec- 
ular brightness considerably below one (typically on the order of 0.01). For molecules 
not at the center of the scanned focal spot, the effective molecular brightness has to 
be additionally scaled by the PSF (h). These aspects are contained in the MCF. The 
goal is to measure the statistics of the numbers of detected photons from the sample 
after each light pulse. 

In a suitable experimental arrangement [22], the detected photons from all 
molecules at a scan coordinate are distributed by an array of beam splitters randomly 
onto four equally sensitive detectors in order to measure the numbers of detected 
photons for every pulse (see Fig. 7.3a). Detectors with at least one assigned photon 
are considered active and count as a detection event. The number of active detectors 
thus becomes the experimentally accessible value. A direct quantitative detection of 
the numbers of fluorescent photons with a time resolution of at least the duration 
between consecutive light pulses would simplify the statistical model as well as the 
experimental setup and would therefore be highly desirable, but at the time when 
these experiments were conducted an array of APD detectors capable of counting 
single photons with a dead time larger than the fluorescence lifetime was deemed an 
acceptable compromise of the ability to detect simultaneously arriving photons and 
experimental effort. 


7.2.1 Statistical Model 


The probability for a single molecule to emit more than one photon in the duration 
of the excitation pulse is negligible. Therefore the photon emission process can be 
well described by a multinomial random process with A; the molecular brightnesses 
of molecules i = 1,.., N located at positions u;. The probability p; of a molecule 
to contribute with a photon to the detection at the current scanning position x is 
p;(x) = A;h(x — u;) with the PSF of the system Ah. 

Due to superposition and independence of the molecular markers, for each scan 
position the number of contributed photons follows a discrete probability distribu- 
tion of a sum of independent Bernoulli trials with parameters p;. The probability 
that exactly k photons contribute during a single pulse is denoted by Q;(x). The 
expressions for k = 1, 2 become 


N 
99) = ) pi |] — pj) 


i=l jFi 
N N 

Oo(x) = > pix) > p; [] U -@) (7.2) 
i=l J>i k£i,j 
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expressing all possibilities of exactly one or two molecules contributing with a pho- 
ton. The p;(x) are much less than unity, and one can simplify these expressions 
further by neglecting terms with higher orders in p; (x): 


N 
Oi(x) ~ pw 


i=] 
l N 2 N 
n(x) ~ 5 (Ur) 2, (7.3) 


Q2 is approximately given by the probability to obtain two photons from any 
molecule minus the so-called ‘antibunching term’ yo p;(x)*, which accounts for 
the unphysical case that two photons would originate from the very same molecule. 


7.2.1.1 Distribution on Active Detectors 


Because the utilized detectors are not able to quantitatively detect the number of 
incident photons, information about the numbers of contributing photons is partially 
lost. This loss can be taken into account by geometrical factors. For example, 3 = 
(d — 1)/d isthe probability that two photons are registered on two different detectors 
for d available detectors. Let D; (x) be the mean number of active detectors at each 
scan position. Neglecting higher order terms of Q; gives 


D(x) = 01%) 
D(x) = pQ) (7.4) 
Using d = 4 detectors results in a loss of about 25% of the two-photon incidence 
events compared to the ideal case of detecting all contributing photons. n(x) is 
denoted as the local fluorophore density and (x) as the molecular brightness (defined 


only where n(x) > 0). The transition to a continuous grid can be easily performed 
by 


N 
Yo pix)! > (pin) «hi 
i=l 


with x the convolution operator. On a continuous grid, the mean number of active 
detectors per light pulse becomes 


Di (x) = (An) x hm) (X) 
D7 (x) = E [on * Am)? — (An) * ha] ©) he) 
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where m denotes the imaging mode (m = {c, s} for confocal or STED recordings), 
which affects the width of the PSF. If (An) x h is not much smaller than one, for 
example if the number of simultaneously recorded molecules is large, additional, 
higher-order terms must be included. Expressions including a possible Poissonian 
background contribution and higher-order terms are given in [22] (see Supplementary 
Note). D” includes the interesting ‘antibunching term’ (A?n) x h?,, which depends 
quadratically on A(x), thus making it a critical parameter. 


7.2.1.2 Estimation of Molecule Density and Brightness Distribution 


The average number of active detectors D{', can be estimated empirically by the 
measured occurrences of active detector events in an experiment Y7",(x) normalized 
by the number of repetitions t, i.e., the number of applied laser light pulses t per pixel. 
The desired molecule density and brightness distributions are ultimately extracted 
by fitting the data with the model in (7.5). This is accomplished by the fast proximal 
gradient algorithm FISTA [23], which minimizes the squared distance between the 
model and the experiment while also penalizing strong variations in \(x). The esti- 
mated molecule density n(x) and molecular brightness A(x) are the solution of the 
constraint optimization problem: 


argmin,, \ > > Qim | D? (n, A) — yra + yO) 


m=c,s i=1,2 


n(x) > 0, A(x) => 0 (7.6) 


with a;j and y positive weighting parameters and ¢ a typical penalization term, the 
Laplacian of the brightness (¢(A) = V7) in order to enforce smoothness in the 
brightness distribution. 

In (7.6) data recorded in both the confocal and STED imaging mode of the same 
specimen is jointly optimized, which is advantageous in practical experiments. The 
confinement to only one of the imaging modes in the fit is straightforward. With the 
value of y appropriately chosen, the penalization sufficiently stabilizes the solution of 
(7.6), preventing strong spatial oscillations in brightness on scales below the resolu- 
tion. The weighting parameters a;,, are chosen such that all least-square residuals are 
approximately on the same scale (am ~ 1/ 3, Y/"(x)). In order to incorporate the 
non-negativity constraints, n(x) and A(x) are substituted by squared variables m?(x) 
and ga) and (7.6) was solved for the m(x) and q(x) instead. As starting point for 
the numerical optimization, a deconvolved single active detector image was chosen, 
which provided an initial molecular density for a given reasonable choice of the 
average molecular brightness. 

The main cause of deviation between model and measurement is shot noise. 
Interestingly, the relative standard deviations (RSTD) of the estimated number of 
molecules n and brightness Î ata given position with molecular signals (such as 
from a single cluster) can be derived analytically (see [22]) 
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Fig. 7.1 Simulation of the error of the estimation of number and brightness for a single cluster 
of molecules in the confocal mode. a, ¢ Relative estimated numbers of molecules n/n and b, d 
relative estimated brightness Ä/ A. The number of molecules n varies from 1-50 with a brightness 
of A = 0.013 (a, b). The brightness varies (0.005-0.035) while keeping the number of molecules 
constant n = 10 (b, d). The number of illumination pulses is 2.5 - 10* (a, b) and 4 - 10% (c, d). 
The FWHM of the PSF is 240. nm. Mean, quantiles (15, 85%) and analytically derived rel. standard 
deviations are shown. Figure adapted from [22] 


RSTD(A) = RSTD(A) œ y(n — 1)/(nX2r) (7.7) 


and is confirmed by simulations (see Fig. 7.1). Calculations and simulations both 
suggest that the RSTD is rather independent of the actual number of molecules and 
can be below ~20% for conditions provided by synthetic fluorophores (A ~ 0.01, 
t > 10°, i.e. more than 1000 pulses) [24]. 

Please note that in the imaging model (mean number of active detectors) in (7.5) 
and possible generalizations, higher-order terms represent convolutions of the molec- 
ular density with higher orders of the PSF h” . In principle, these terms represent parts 
of the image with an increased resolution, however their contribution is rather low 
as they scale with higher orders of the molecular brightness. Measurements of these 


higher-order PSFs are displayed in Fig. 7.2. 


7.2.2 Intrinsic Molecular Brightness Calibration 


Double-stranded DNA (dsDNA) is easy to synthesize and label in a controlled man- 
ner. Labeling the dsDNA with up to four ATTO 647N fluorophores and immobilizing 
it sparsely on a glass surface allows controlled measurements as well as indepen- 
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Fig. 7.2 The PSFs of 1—4 active detector events in confocal and STED mode. DNA origami with 
up to 24 ATTO 647N in two lines of 41 nm spacing was sparsely immobilized and measured in 
confocal (a, overlay of 64 single DNA images) and STED (b, overlay of 114 single DNA images) 
mode. On the right: normalized line profiles crossing the center of the PSFs. All overlays allowing 
a fit with a 2D Gaussian peak function display resulting FWHM values. H is the maximum value 
(events). Scale bars: 100nm. Figure adapted from [22] 


dent estimates of the number of fluorophores by observing individual bleaching 
steps. A schematic of a microscope with a fluorescence detection path split into four 
equivalent units is shown in Fig. 7.3a. For a measurement in confocal mode with 
t = 3000 excitation pulses per pixel, resulting in 6000-7500 collected photons per 
fluorophore in the resulting scanned image, the number of fluorophores can be esti- 
mated with ~20% uncertainty and matches the number of observed bleaching steps 
(see Fig. 7.3). However, the advantages over observing abrupt changes in brightness 
by stepwise bleaching are that the molecules remain intact (except for the gradual 
bleaching during multiple scans) and that the observation of bleaching steps often 
suffers from uncertainty about the number of simultaneously bleached molecules. 


7.2.3 Counting Transferrin Receptors in HEK293 Cells 


The bleaching rate of ATTO 647N in the experiments shown in the previous section 
(Figs. 7.2 and 7.3) is as low as ~3% per full scan [22]. This allows to perform a 
STED recording right afterwards, so that the photon statistics of both recordings 
can be combined and the optimization algorithm can use both datasets for the solu- 
tion of (7.6). The combined data yields the molecular numbers, mostly derived from 
the statistical information contained in the confocal recordings, with a spatial 2D 
resolution of ~50 nm mostly originating from the STED recording. Using the abil- 
ity of fluorescence microscopes to look into the interior of cells, transferrin receptor 
(TfR) molecules in human embryonic kidney 293 (HEK293) cells were counted. The 
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Fig. 7.3 Experimental setup and measurements on double-stranded DNA (dsDNA) labeled with 
up to four ATTO 647N. a Confocal/STED microscope equipped with four independent detection 
channels. BS 1:1 beam splitter, D; i-th detector (i = 1-4). b Fluorescence bleaching steps of single 
dsDNA (corresponding spots are indicated in d by triangles of the same colors). e Comparison 
of the number of dye molecules (mean and s.d.) derived from photon incidence analysis with the 
detected number of bleaching steps of the same single dsDNA (red line: y = x). d, e Example 
image showing one- and two-photon detection events measured in the confocal mode. The dsDNA 
positions with only a single dye molecule are indicated by open triangles in (e). f Established map 
of the number of ATTO 647N molecules, derived from the data in (d) and (e). H is the maximum 
of the color scale, representing events in d and e and molecules in f. Scale bars: 1 m. Figure 
reproduced from [22] 


TfR molecules were labeled by an ATTO 647N-conjugated DNA aptamer because 
attaching a single fluorophore to each aptamer molecule can be performed with high 
precision [25]. With the aptamers designed to bind to their target in a one-to-one 
stoichiometry, quantification becomes straightforward. Images of the distribution of 
TfR clusters are shown in Fig.7.4. Using information from the two active detector 
events, one can estimate that the total number of internalized TfRs in each cell is on 
the order of ~100000. In addition, this counting method can determine molecular 
density variations of internalized TfRs in the cell, which is not possible by conven- 
tional measurements such as quantitative Western blots. With the spatial resolution 
delivered by the STED recording, most intracellular TfRs can be visualized as sep- 
arate clusters. Interestingly, the number of estimated TfRs in each isolated cluster 
closely follows an exponential distribution with an expectation of ~6.0 + 1.9. It 
also indicates that a single cluster may have a capacity to accommodate more than 
20 TfR molecules since the measurement fits an exponential distribution up to 20 
fluorophores closely. 
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Fig.7.4 Counting the number of transferrin receptors (TfR) in HEK293 cells. Living cells incubated 
with ATTO 647N-conjugated anti-TfR aptamer. After incubation, excess aptamer molecules were 
washed off and cells were chemically fixed. Stained receptors were imaged in confocal and STED 
mode, with 100nm increments along z. a Summed axial projection of confocal and STED images 
(raw data) along 0.9 um depth. b Estimated 3D molecular map resulting from photon statistics of 
both confocal and STED recordings. Colors present the axial position. ¢ Isosurfaces of the molecular 
map in the boxed region in a and b. The isosurfaces include 70% of all molecules in this region. 
The number of molecules in each segment is displayed below. d The histogram of the estimated 
number of TfR receptors per recognized separated segment. The red line is an exponential fit of the 
number of occurrences up to 24 molecules per spot (inset shows the residual of the fit). Scale bars: 
1 um. Figure reproduced from [22] 


7.3 Mean and Variance in RESOLFT Nanoscopy 


RESOLFT experiments require switchable fluorophores with on- and off-states [2]. 
For simplicity, only the positive imaging mode is discussed here, i.e., where the MCF 
features a sharp, subdiffraction-sized positive signal peak. This is the commonly used 
imaging mode. Another common assumption is that the molecules independently 
switch between their states and independently emit fluorescence. 

Two main steps can be differentiated. At first, fluorophores are transferred from 
the off-state, in which they reside initially, into the on-state only within a sharp, 
subdiffraction-sized region centered at the current scanning position s. In practice, 
this is achieved by first activating fluorophores with a diffraction-limited focus, and 
then deactivating fluorophores in the periphery using a doughnut-shaped focus with 
a central intensity minium [6]. The spatial distribution of the probability of a single 
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fluorophore to be effectively activated after the first step is referred to as the activation 
probability p(x). As a second main step of RESOLFT, activated fluorophores are 
then read out by recording their fluorescence signal. The spatial distribution of the 
average recorded fluorescence photon signal of a single activated fluorophore during 
the readout time is \(x), which is typically proportional to the confocal PSF but can 
also be performed in STED mode [26]. 

The MCF of RESOLFT nanoscopy is therefore given by the activation probability 
multiplied by the average readout signal, 


MCF) = p(x)A(x). (7.8) 


In the following, the influence of the switching step on the obtained photon statis- 
tics is studied. To demonstrate the principal statistical properties of the two-step 
imaging process in RESOLFT, a simplified model is analyzed. 


7.3.1 Cumulants of the Fluorescence of Switchable 
Fluorophores 


Let us begin by assuming n molecules, all with the same activation probability p 
and the same brightness A, which is the average signal that is obtained from a single 
activated fluorophore. Further, let us assume that there is no spatial dependency in the 
measurement process (e.g. a situation of an isolated cluster of molecules where the 
scanning position and the cluster position coincide). There are three unknown prop- 
erties (number of molecules, activation probability and brightness) but only a single 
variable (fluorescence signal) that is measured. How can the unknown properties be 
retrieved with only a single measurement variable? The basic idea is to exploit higher- 
order characteristics (cumulants) of the measured signal that go beyond the mean in 
order to separate the unknown properties. Figure 7.5 illustrates this idea, showing that 
different choices of parameters resulting in the same mean signal can still differ in 
their variance (or other higher-order characteristics). For a purely Poisson-distributed 
signal, all cumulants revert to the same value, the mean of the signal. It turns out that 
the switching step in RESOLFT nanoscopy is the essential factor in separating the 
number and the brightness. 

The goal is to calculate the cumulants of the fluorescence signal of a single 
molecule, given p and X. The signal can be modeled as the result of a two-step 
stochastic process, where the first step is the activation and the second step is the pho- 
ton emission by the activated fluorophore. The activation is modeled as a Bernoulli 
process with activation probability p. The activation state of the fluorophore is rep- 
resented by a Bernoulli random variable A. If the fluorophore is in the on-state it will 
yield A detected photons on average, which is modeled by a Poisson distribution. 
Therefore, the random variable describing the signal of a fluorophore B given the 
activation state A is 
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Fig. 7.5 Left: Depiction of a cluster of n = 20 molecules with different activation probabilities 
p and different brightnesses À. Right: Photon counting histogram. The mean remains unchanged 
for all cases, while the variance differs. Calculating statistical parameters of the measured photon 
numbers allows estimating the number of molecules. Figure adapted from [27] 


B|A = 1 ~ Poisson(\) 
B\|A=0~0 (7.9) 


The characteristic function @(t) of a Bernoulli-distributed random variable is 
1 — p+ pexp (it) and of a Poisson-distributed random variable exp (A(e' — 1)). 
The characteristic function of B can be computed with a conditional expectation 
over A. 


ot) = Ea Esja exp (itB) = 1 — p + p exp (\(e"" _ 1)) (7.10) 


where E represents the expectation. The cumulant-generating function H (t) is the 
logarithm of ¢ with a series expansion giving the cumulants 


oo . 
t n 
H(t) = log ot) = > kn wy (7.11) 
n! 
n=10 
The cumulants are polynomials with increasing order in p and À. 
Kı = pà 
k2 = pà + p(l — p)»” (7.12) 


k3 = pA— 3p(1— pX + p(l- p)(1-2p)X 


The molecules are assumed to act independently from each other, so that the 
cumulant of the summed signal of all n molecules can conveniently be expressed as 
the sum of single-molecule cumulants. This means that the expressions in (7.12) are 
simply scaled by n. 
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If the first three cumulants would also be estimated empirically as Å1,2,3 one 
could solve the non-linear equation system in (7.12) to obtain estimates n, p, A for 
the activation probability, the brightness and most importantly, for the number of 
molecules. 


â = Ri? (Ri = Ra) / (Ri? — Rika — Hy? + Kis) 
B= (R? -Ki HR) / (Him — 2? HR) (713) 
Â = (Kira — 2° + Kira) / (£i (i - Re) 
This principle is exploited in the next section for a realistic RESOLFT imaging 
model. 


7.3.2 Statistical Model 


In the general case, the activation probability and signal in the active state is position- 
dependent. The following transformations hold 


p > p(x — Ss) 
A> A(X; = s) 


with x; the position of the molecules and s the scanning position. Again, the signal 
is a result of a two-stage stochastic process. The activation strength and the aver- 
age readout signal, however, both depend strongly on the position of the molecules 
relative to the focal center. Rewriting the mean m(s) and variance v(s) of the total 
signal using the results of the previous section (see (7.12)) with position-dependent 
parameters and adding a Poissonian background d (s) gives 


N 
m(s) =), PX: — 9A&; — s) + d (8) 


i=1 


N 
v(s) = m(s) + -U-PR —s)) Vj —s) (1.14) 


i=1 


The transition to a continuous grid can be carried out analogously to the procedure 
in Sect. 7.2.1. With the transformation 


N 
Y= PO = AK — 8) > (n * pà) (8) 


i=l 
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the mean and variance of the total signal can conveniently be expressed as convo- 
lutions with the density of fluorophores n(x). Additionally, the readout signal A is 
written as a product of the focal brightness b (i.e. A(0)) and the readout PSF Ayead- 
The focal brightness is assumed constant for all the molecules. 


m(s) = b (n * hreaa p) (S) + d (s), 
v(s) = m(s) + b? (n * haa p(l — p)) (9). (7.15) 


The mean is the convolution of the number density with the MCF and background 
contributions while the variance exceeds the mean by an additional term with a 
convolution kernel Beh; p (1 — p) which is similar but not identical to the MCF. 
Note that for p(x) = 1 the situation of non-switchable fluorophores is recovered, the 
excess variance term vanishes and a purely Poisson distributed signal is regained. 
In the case of photo-switchable fluorophores, the variance of the collected signal is 
augmented due to the stochastic nature of the activation in the preparation step. The 
excess variance or ‘over-dispersion’ term is proportional to the square of the focal 
brightness b, unlike the mean signal, which scales linearly with it. This relation can 
be used to estimate b directly from integrated mean and variance of the image data 
using a Method of Moments estimator (MME). 

For a region X in the sample that comprises a conglomeration of molecules but 
is isolated from other such regions the position-dependent mean and variance of the 
signal can be summed (M = f x m(s) ds, V = f x v(s) ds) and the convolutions in 
(7.15) reduce to simple products. 


M =bNxHı +D, 
V = M + b?Nyx H», (1.16) 


with Hı = fg (reap) ds and Hr = fg (AZaap (1 — p)) ds being integrals of prod- 
ucts of p and head and D = f, x d ds the integrated background. If the integrated mean 
and variance can be estimated empirically as M and V then (7.16) can be solved for 
the focal brightness resulting in the MME b: 


(7.17) 


The results of a simulation of the RESOLFT imaging process and the estimation 
of the focal brightness are shown in Fig. 7.6. Essentially, the relative errors of b and 
n depend mainly on the number of measurements, and are largely independent of the 
number of molecules in the image [27]. 
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Fig. 7.6 Simulation of the error of the estimation of number and brightness for a single cluster of 
molecules. a Relative standard deviation of the estimated number of molecules (for 20 molecules 
in a single cluster) with an activation probability of 20% and a variable number of repetitions. b 
Relative standard deviation of the estimated brightness under the same conditions. Above a threshold 
on the molecular brightness, the counting error mainly depends on the number of repetitions. Figure 
adapted from [27] 


7.3.3 Counting rsEGFP2 Fused a-tubulin Units 
in Drosophila Melanogaster 


We illustrate the approach for RESOLFT nanoscopy imaging of Drosophila 
melanogaster larvae body wall muscles ubiquitously expressing rsEGFP2, a 
reversibly photoswitchable fluorescent protein (RSFP) [6], fused to a-tubulin. A 
commercial RESOLFT nanoscope was used for the recordings, and the preparation 
of the sample is described in [28]. Tubulin is known to form helices with a diam- 
eter of approximately 25 nm, each turn comprising 13 dimers of œ- and (-tubulin 
that are spaced 8nm apart [29]. The total density of a-tubulin along a single fila- 
ment can thus be estimated to be +1625 per um. To analyze the ratio of labeled to 
non-labeled «-tubulin subunits in body wall muscles of Drosophila melanogaster, 
L3-larvae were dissected to isolate body wall muscles, which were subsequently used 
as a sample for Western blot analysis. The ratio of rsEGFP2-a-tubulin to a-tubulin 
in the muscle tissue was estimated to be ~1:6. Therefore, the number density of 
rsEGFP2 molecules along a microtubule fiber is expected to be ~230 per um. Can 
a RESOLFT imaging experiment confirm this independently measured molecule 
density using the framework laid out in Sect. 7.3.2? 


7.3.3.1 Switching Kinetics of rsEGFP2 


Measurements of the switching kinetics of rsEGFP2 were conducted on a confocal 
point scanning microscope with additional widefield illumination paths for excitation 
at 491 nm and activation at 375 nm (both continuous wave). The signal was detected 
with a point detector at 525 + 25 nm. The advantage of widefield illumination and 
point-like detection is that averaging effects of the observed kinetics due to inhomo- 
geneous intensity distributions within the detected volume can be excluded. rsEGFP2 
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is activated by UV light and switches off while being read out, which means that 
during the effective activation step in RESOLFT while molecules are switched off 
fluorescence can also be recorded. 

In this measurement, molecules were activated by applying the UV light for a com- 
paratively long time (several ms), thus strongly saturating the activation. Assuming 
that this prepares all the molecules in the on-state, the fluorescence signal at any 
given time during the off-switching indicates the remaining on-state population of 
rsEGFP2 at that time. An example of an off-switching curve is shown in Fig. 7.7. 
Systematic deviations from a single exponential decay are evident, especially for 
intermediate times, motivating the use of a Gamma distributed decay model with 
parameters a and ß for fitting the off-switching curves. 


Be 


(7.18) 


with the total switching rate k given by the inverse of the time (exp (1/a) — 1) 
where the signal drops to 1/e of the amplitude. The equilibrium signal after complete 
switching, normalized by the initial signal, estimates the equilibrium population poo. 
It is ~2.5%, independent of the excitation strength. The switching rate seems to 
depend quite linearly on the excitation light intensity for intensities up to a few 
kW/cm?. These are values typical for light intensities close to the center of the 
focal intensity minima, which determine the resolution enhancement ina RESOLFT 
microscope. 
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Fig. 7.7 Switching kinetics of rsEGFP2 in living E. coli cells overexpressing rsEGFP2 on a cover- 
slip in an agarose gel. a Single switching curve with an off-switching light intensity of 500 W/cm?. 
Fits with a single exponential decay (red) and a gamma distributed exponential decay (yellow) 
with residuals shown below. b Estimated switching rate k in dependence on the off-switching light 
intensity. e Equilibrium on-state level in dependence of the off-switching light intensity. Figure 
reproduced from [27] 
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7.3.3.2 Shape of the MCF 


The MCF for rsEGFP2 molecules is modeled as a simple two-state switching process 
(‘on’/‘off’), with switching rates depending linearly on the applied light intensity and 
the frequently-made approximation that the doughnut of off-switching light features 
a parabolic intensity distribution close to the focal center [30]. The shape of the MCF 
is then a superposition of two Gaussian peaks that differ only in amplitude and width: 


MCF(r) = b (pace "Orr + (pO) — pao) trO Mo) (7.19) 


with the equilibrium activation level pæ and the focal activation level p(0). The 
broader peak with diffraction-limited resolution Wreag and low amplitude represents 
the signal from the nonzero equilibrium activation (see Fig.7.7c) while the sharp 
peak with increased resolution wresoLrr represents the usable RESOLFT signal that 
originates from the subdiffraction-sized spot with an effective activation level above 
the equilibrium value. While the absolute scaling factor, the focal brightness b, is 
generally very difficult to predict, the relative amplitudes and sizes of the two Gaus- 
sian spots forming the MCF can typically be modeled reasonably well or retrieved 
from the data itself. Here, the structure consists of microtubule fibers with a width 
well below the resolution. For a sufficiently sparse distribution of filaments and high 
enough signal-to-noise ratio, the object can approximately be estimated from the data 
as a set of curved lines, even without quantitative knowledge of the MCF, simply by 
detecting lines in the image [31]. With the given image data and estimated object the 
shape of the MCF can be retrieved by a deconvolution and a fit of the shape model 
to the deconvolution result. Image data and retrieved object are shown in Fig. 7.8. 
The estimated FWHM of the diffraction-limited peak was 235 nm, and of the sharp 
RESOLFT peak 73 nm, and with the knowledge of the equilibrium activation level 
of ~2.5% the focal activation level can be estimated to be ~17%. 


7.3.3.3 Counting the Number of Molecules Along Filaments 


Applying the method of moments estimator for the focal brightness given in (7.17) 
on the whole image shown in Fig. 7.8a yields a value of b of ~0.9 photon counts per 
activated rsEGFP2 molecule. Please note that the readout time in this experiment was 
set to a very short duration because rsEGFP2 switches off during readout, however, 
there are RSFPs that decouple the off-switching and the readout [32, 33]. The total 
average contribution of each fluorophore to the RESOLFT image can be estimated 
by integrating the MCF given by (7.19) using the estimated shape and estimated 
focal brightness. The average value of 3.58 photons per rsEGFP2 molecule can then 
be used to quantify the number of fluorophores in a region in the image. The results 
of applying the estimator to the regions marked in Fig. 7.8a are shown in Table 7.1. 
The average number of rsEGFP2 molecules per jum along a microtubule filament 
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Fig. 7.8 Counting a-tubulin units in Drosophila melanogaster. a RESOLFT image of dissected 
body wall muscle cells expressing rsEGFP2-«a-tubulin with regions (A-E) in which the number 
of rsEGFP2 molecules was counted. b Line detection of the tubulin structure. c, d PSF shape 
detection based on the image (a), the object detection (b) and the equilibrium on-state population 
(see Fig. 7.7). Scale bars, 1 um (a, b), 200 nm (c). Figure reproduced from [27] 


Table 7.1 Analysis of the regions A-E in the number density shown in Fig. 7.8a 


Region Number Length in nm Density per um 
A 268 1324 203 
B 405 2065 196 
C 379 2147 176 
D 293 1713 171 
E 237 1378 172 
A-E 1582 8627 183 


is ~180, which is similar but below the expected number density. This might hint 
at a less efficient incorporation of rsEGFP2 fused to a-tubulin in microtubules. A 
detailed description of this experiment as well as a theory for the counting error is 
given in [27]. 


7.4 Summary 


Ensemble-based, coordinate-targeted fluorescence microscopy methods like STED 
or RESOLFT deliver spatial resolution on the nanometer scale [34]. Registering 
all the molecules at a certain position at the same time in principle provides an 
advantage of recording speed. The repeatability and live-cell compatibility make 
quantitative applications of STED and RESOLFT highly desirable. Reaching this 
goal, however, has so far been challenging, in large measure because the observed 
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Poisson statistics of the fluorescence signal does not lend itself to separating the 
number of the molecules from their brightness. 

Fortunately, effects like photon ‘antibunching’ or an additional on-switching step 
of fluorophores manifest themselves as higher-order terms in the observed photon 
statistics. A careful statistical analysis and modeling of the imaging process reveals 
that these higher-order components differ mostly by containing higher powers of the 
molecular brightness. Estimating these components empirically allows to solve the 
inverse problem and retrieve the molecular density and brightness from the observed 
non-Poissonian photon statistics. 

The statistical models for both investigated effects for STED and RESOLFT 
nanoscopy are remarkably similar. However, as higher-order effects, their contribu- 
tion to the signal is usually only weak and a sufficiently high signal-to-noise ratio is 
required for the described methods to yield precise results. In particular, the molecules 
should not photobleach during the experiment. 

A central assumption is the independence and uniformity of behavior of the indi- 
vidual molecules, at least locally within a region of interest. A violation of this 
assumption or a compromised (false) estimation of the shape of the MCF will result 
in a biased counting procedure. 

This chapter demonstrated that modeling and analysis of the obtained photon 
statistics are key elements to achieving quantitative subdiffraction-resolution infor- 
mation in STED and RESOLFT nanoscopy. 
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Chapter 8 A) 
Metal-Induced Energy Transfer Imaging as 


Alexey I. Chizhik and Jörg Enderlein 


Abstract Super-resolution microscopy has seen atremendous development over the 
last two decades. It has opened new perspectives for the application of fluorescence 
microscopy in the life sciences. Achieving a spatial resolution beyond the diffraction 
limit of light allowed one to observe many biological structures that are not resolvable 
in conventional fluorescence microscopy. However, despite recent development of 
super-resolution fluorescence microscopy techniques that allowed for squeezing the 
lateral resolution down to tens of nanometers, the much less axial resolution remains 
a key limiting factor for applications where z-sectioning of a sample is needed. 
In this chapter, we present the recently developed fluorescence imaging method 
that is called metal-induced energy transfer. It combines unprecedented nanometer 
resolution with technical simplicity that allows life science researchers to use it 
with standard microscopes. We discuss basic principle of the method, its theoretical 
background, and its applications for imaging of various sub-cellular structures. 


PACS Subject Classification: 87.64.M- - 73.20.Mf 


8.1 Introduction 


Fluorescence imaging is one of the most commonly used techniques for investiga- 
tion of biological systems. Among its key advantages are (i) possibility to observe 
live samples in real time, (ii) the technical simplicity that makes it accessible for a 
broad community of life-science researchers, and (iii) specific labeling allows one 
to directly visualize sub-cellular structures. However, the wave nature of light lim- 
its the spatial resolution of a conventional fluorescence microscope, i.e. it cannot 
resolve structures smaller than the diffraction-limit. In the visible spectral range, this 
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corresponds to a spatial resolution of roughly half a micrometer along the optical 
(z-) axis, and of about a quarter of a micrometer in the xy-plane. The field of super- 
resolution microscopy has seen a tremendous development over the last two decades 
and has opened up new advances for the application of fluorescence microscopy in 
bio-imaging. However, each of the existing methods are either technically challeng- 
ing and require high light excitation intensities at the limit of what is tolerable for 
live-cell imaging, or are rather slow and require specialized labels and environmental 
conditions, which are not always compatible with live-cell microscopy. Moreover, the 
majority of these methods suffer from one common problem: Their axial resolution 
is by roughly one order of magnitude worse than their lateral resolution. 

In this chapter, we present a new fluorescence-based method called metal-induced 
energy transfer (MIET), which is based on the energy transfer from an optically 
excited donor molecule to a thin metal film. It allows one to achieve an axial local- 
ization of a fluorophore with down to one nanometer accuracy. This goes far beyond 
the diffraction limit of light microscopy and surpasses in accuracy all known light- 
based techniques for enhancing the axial resolution. One of the key advantages of 
this method is that it does not require any hardware modification to a conventional 
fluorescence-lifetime imaging microscope (FLIM), thus preserving its full lateral 
resolution. The technical simplicity of MIET and its compatibility with live-cell 
imaging makes it applicable for broad range of studies. 

This chapter partly overlaps with recent review papers that discuss various aspects 
of MIET imaging [1, 2]. However, in contrast to the previous publications that focus 
on specific points, such as single molecule imaging using MIET or its comparison 
with other methods for high resolution axial localization, this chapter provides readers 
with a general overview of basic principle of MIET and its potential for bio-imaging. 


8.2 Basic Principle and Theory 


It was predicted by Edward Purcell in 1946 [3] that placing a fluorescent molecule in 
the vicinity of a metal quenches its fluorescence emission and decreases its excited 
state lifetime. From a physics point of view, the mechanism behind this phenomenon 
is similar to that of FRET [4]: energy from the excited molecule is transferred, 
via electromagnetic coupling, into plasmons of the metal, where energy is either 
dissipated or re-radiated as light. This fluorophore-metal interaction was extensively 
studied in the 1970s and 1980s [5], and a quantitative theory developed on the basis of 
semi-classical quantum optics [6, 7]. The achieved quantitative agreement between 
experimental measurement and theoretical prediction was excellent (Fig. 8.1). 
Owing to the fact that the energy transfer rate is dependent on the distance of a 
molecule from the metal layer, the fluorescence lifetime can be directly converted 
into a distance value (Fig. 8.2). The theoretical basis for the success of this conversion 
is the perfect quantitative understanding of MIET [8]. It is important to emphasize 
that the energy transfer from the molecule to the metal is dominated by the interaction 
of the molecule’s near-field with the metal and is thus a thoroughly near-field effect, 
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Fig. 8.1 Calculated dependence of fluorophore lifetime on its height over the metal film. Curves 
are calculated for an emission wavelength of 650nm and a gold film thickness of 20nm deposited 
on the glass substrate 


Fig. 8.2 Geometry of MIET sample: A fluorophore is placed above of a thin metal film deposited 
on glass. Fluorescence detection is done with a high numerical aperture objective lens from the 
glass side. Fluorescence excitation is performed by the same lens. In the figure, an electric dipole 
emitter is placed at a distance z from the metal film. Its orientation is described by the angle 3 
between its dipole axis and the optical (vertical) axis. The angular distribution of radiation into the 
glass is depicted as a red curve and is a function of angle 0. The critical angle @,, of total internal 
reflection between glass and water is also shown 


similar to FRET. However, due to the planar geometry of the metal film, which acts 
as the acceptor, the distance dependency of the energy transfer efficiency is much 
weaker than the sixth power of the distance, which leads to a monotonous relation 
between lifetime and distance over a size range between zero and ~250 nm above 
the surface. 

Evaluation of MIET measurements can be done by modeling the emission prop- 
erties of an emitter above a metal surface. The geometry of the modeled situation is 
shown in Fig. 8.2. 
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Let us consider the emission of a single molecule with orientation angles (a, ß, 
where (3 denotes the inclination towards the vertical axis and a the angle around that 
axis. The molecule is assumed to be an electric dipole emitter. Then, the electric field 
amplitude of its emission into direction (0, @) is given by the general formula 


Eem = êp [A cos 8 + Aj sin 8 cos ($ — a)] + ê Aș sin Asin (ġ—a) (8.1) 


where the are functions of emission angle 0 but not on a, 8 or d. Explicit expressions 
for can be found in a standard way by expanding the electric field of the dipole 
emission into a plane wave superposition and tracing each plane wave component 
through the planar structures using Fresnel’s relations, for details see [9-12]. It is 
important to note that the functions depend also on wavelength. Knowing the electric 
field amplitude of the emission into a given direction (0, &), one can then derive the 
total power of emission as 


Stota (8, a) x By cos? B+ Bi sin? B (8.2) 


with weight factors Bj which take into account also the absorption of emitted 
energy within the metal layer, for details of their calculation see [9]. Knowing the 
total emission power Stota, one can then calculate the lifetime of the molecule by 


F _ So 
To 7 ® Stotal + 1-@ 


(8.3) 


where Sọ is the total emission power of the emitter in free space (sample space), ® is 
the quantum yield, To is the free space excited state lifetime lifetime of the emitter. For 
calculating the lifetime-distance curve, one has to average the result over all possible 
molecular orientations (assuming that there is no preferred molecular orientation in 
the sample) and the emission spectrum of the emitter (using the free-space emission 
spectrum as weight function). 

Experimentally, one needs a standard scanning confocal microscope that allows 
one to do fluorescence lifetime imaging (Fig. 8.3), that is, equipped with a pulsed 
excitation laser and a single photon avalanche diode. The only addition that is required 
for MIET imaging is coating the substrate with a semitransparent metal film, typically 
10-15 nm. Gold as a coating material combines such crucial properties as non- 
toxicity for living cells, absence of oxidation, and high transparency compared to 
other metals. 


8.3 The MIET-GUI Software 


We have developed a Matlab-based MIET-GUI for analysis of measured data. The 
MIET-GUL is a graphical user interface designed for various types of data evaluation, 
for instance the conversion of the raw FLIM data into a MIET image. The software 
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Fig. 8.3 Schematic of the experimental set-up for MIET imaging 


can be downloaded via the link www.joerg-enderlein.de/MIET/MIETGUL zip. The 
MIET-GUI accepts .ht3 and .ptu files generated by the FLIM-hardware HydraHarp 
of PicoQuant GmbH (Berlin), from which it calculates the lifetime and intensity for 
every pixel of an image, elliptical regions of interest (ROI) or the patterns generated 
by scanning the excitation light over single dipole emitters. These lifetimes are con- 
verted into height information via the MIET lifetime versus height calibration curve 
(Fig. 8.4). 

As a first step, the user has to choose the general type of evaluation, pixel-by-pixel 
or one of the more elaborate ROI/pattern techniques. In the pixel-by-pixel mode, the 
time-correlated single photon counting (TCSPC) histogram of each pixel with more 
than 25 photons is assembled. The shape of these histograms can be described by a 
steep rise followed by a peak and then an exponential decay. By setting a cutoff after 
which the curve is purely exponential and calculating the mean arrival time of the 
photons after this cutoff, one gets the lifetime value for this pixel. In the ROI mode, 
the user specifies an elliptical region of interest believed to belong to molecules 
with the same lifetime. The photons from all pixels within the ROI are collected 
into a single histogram, which is less prone to noise problems than histograms for 
single pixels. For this reason, the histogram can be fit with either mono- or multi- 
exponential decay curves, thus finding the lifetime of the molecules in the ROI. The 
most sophisticated mode is the pattern matching mode. Here, the user has to specify 
the parameters of the excitation light such as the wavelength, the polarization mode of 
the laser, the numerical aperture of the objective and the defocusing of the objective. 
From these parameters, the patterns generated by scanning the excitation beam over 
molecules with different angular orientations can be calculated. The intensity image 
obtained by integrating the TCSPC data over time is now fitted with the simulated 


232 A. I. Chizhik and J. Enderlein 


nn =x) 
Info Choose source files and evaluation parameters 
How to use this Gut F electr. DT 

1. Choose the evaluation mode: standard MET (pixek-by-pixei) or single © Catewate Ifetmes from raw data [ N3-fle] nr 

molecule MET (using pattern matching) C Use previously calculated Ietmes [ mat- fie] m 
2 Press the button Define sample parameters’ to set parameters such detector DT Ins] 

as wavelength, refractve ndices 
3 Choose the evaluation sutmose: only Metme and ntensty mage, Piensa chacas fhs rancoled sume of ho We 73 

MET height profile, different fing modes etc 

4. Only for standard MET: estmate how many ns after the peak of the Choose fie 


tespc-curve the curve looks ike an exponential decay ('cutoff-tme") 
$. Only for standard MIET: decide if you want to reed in an 
unprocessed ht}-fie or if you want to evaluate data thet has 

sready been converted to Ifetimes (needs the file ending on _PS mat). 


experimental 
‚or if you want to estimate t from your normal data 
7. Use a third flename to calculate the free space Ifetme of your dye 
via three different methods or rectly type in the value if t is known to 
von 


G Manualy enter free space ifetme [ns] free space LT [ns] 
c =e 
Type of evaluation Calculate tree space Ifetme trom raw date 43 


@ Evalate MET data pocetby-poe! 
© Use single-molecule patterns to evaluate MET data 


Detre samcie parameters: 
[Dr | 


Cutott-tme Insfo.5 I” Only photons after cutoff in intensty ima 


T Save basic images? 


MT Visualize height profie? 


Fig. 8.4 Interface of the Matlab-based MIET-GUI software for the analysis of raw data 


patterns to determine the position and orientation of each single dipole emitter. The 
photons from all the pixels assigned to a molecule’s pattern are grouped into a single 
histogram and fitted as in the ROI mode. 

In the second step, the lifetime information is converted into height information. 
To this end, the user has to specify the emission wavelength, the quantum yield 
and the excited state lifetime of the emitters as well as the thicknesses and complex 
refractive indices of all materials in the sample (e.g. metal-coated glass cover slides, 
buffer solutions etc.). As described above, this data can be used to calculate the 
observed lifetime as a function of the dipole’s height above the interface and its 
angle with the optical axis. In the pixel-by-pixel evaluation mode, nothing is known 
about the particle’s orientation, so a random orientation is assumed and the calibration 
curve calculated accordingly. In the pattern matching mode, the particle’s orientation 
is known and the correct curve is used for the evaluation. If the emission spectrum of 
the fluorescent probe is known, the calibration curves obtained for all wavelengths 
that are able to pass the optical filters are calculated and averaged according to the 
spectrum. A complication arises from the fact that the lifetime versus height curve 
oscillates, meaning that some lifetime values cannot be matched unambiguously to 
a height value. The first possibility for solving this problem is to crop the calibration 
curve at the largest unique value and to mark all longer lifetimes as ‘not a number’ 
in the height image. If there is a prior knowledge about the sample states that no 
height values larger than the value corresponding to the first peak in the calibration 
curve can exist, it is possible to crop the calibration curve at this peak. The height 
information gained through this progress can then be visualized or used for further 
analysis. 
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8.4 Metal-Induced Energy Transfer for Biological Imaging 


The applicability of MIET for live-cell imaging has been first shown by mapping 
the basal membrane of living cells with nanometer accuracy [13]. Knowledge of the 
precise cell-substrate distance as a function of time and location with unprecedented 
resolution provides a new means to quantify cellular adhesion and dynamics, as is 
required for a deeper understanding of fundamental biological processes such as cell 
differentiation, tumor metastasis and cell migration. 

As a biological model system three adherent cell lines were chosen: 
MDA-MB-231 human mammary gland adenocarcinoma cells and A549 human 
lung carcinoma cells, which are able to form metastasis in vivo models, as well as 
MDCK-II from canine kidney tissue as a benign epithelial cell line. Interestingly, 
significant differences in the cell-interface distance between a normal epithelial cell 
and cancerous cell lines were observed. 

Figure 8.5a and b show the measured intensity and lifetime images that were used 
to obtain the 3D reconstruction of the basal cell membrane. Because the variation 
of the fluorescence intensity is not only dependent on the metal-induced quenching, 
but also on the homogeneity of labelling, exclusively the lifetime information was 
used for reconstructing a three-dimensional map of the basal membrane. On the other 
hand, the intensity distribution was used to discriminate the membrane fluorescence 
against the background. Regions with no cells are difficult to identify from the lifetime 
images alone, as the lifetime values can become exceedingly scattered at low signal- 
to-noise ratios. Figure 8.5c shows the result of recalculation of the lifetime image 
into the 3d height profile. 

A relatively fast scanning speed of a confocal microscope that is used for MIET 
imaging allows to monitor dynamic processes. Figure8.6 shows the spreading 
behaviour of MDCK-II cells. Generally, the spreading process of adherent cells 
can be divided into three distinct temporal phases. The first phase is characterized 
by the formation of initial bonds between adhesion molecules and molecules of the 
extracellular matrix. This process of tethering is followed by the second phase, which 
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Fig. 8.5 Simultaneously acquired fluorescence intensity (a) and lifetime (b) images of the basal 
membrane of living MDA-MB-231 cells grown on a gold-covered glass substrate, acquired with 
a standard confocal microscope. ¢ Three-dimensional reconstruction of the basal cell membrane. 
Three-dimensional profiles computed from the fluorescence lifetime image (b) 
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Fig. 8.6 Time elapsed MIET images recorded in 5 min time intervals showing the late stages of 
cell (MDCK-II) spreading on gold. The cell forms tightly attached protrusions/lamellipodia away 
from the center of the cell. The cell occupies a larger area with time and presses down more closely. 
A darker color refers to lower cell-substrate distance. At later stages (k-n) first lamellipodia are 
formed that exhibit a low cell-substrate distance 


comprises the initial cell spreading and that is driven by actin polymerization. The 
latter forces the cell surface area to increase by drawing membrane from a reservoir 
of folded regions. The third phase encompasses recruitment of additional plasma 
membrane from the internally stored membrane buffer and extension of lamellipo- 
dia to occupy a larger area. 

The axial resolution of the recorded images can be determined by calculating the 
standard deviation of cell-substrate distance. The resolution depends on the photon 
rate and varies between 2 and 4 nm for typical fluorescence intensities measured and 
can be further enhanced to 1 nm by increasing the number of detected photons. 

The unprecedented axial resolution of MIET allowed us to monitor the cell- 
substrate distance of epithelial NMuMG cells during the biological process of the 
epithelial-tomesenchymal transition (Fig. 8.7) [14]. EMT allows epithelial cells to 
enhance their migratory and invasive behavior and plays a key role in embryogenesis, 
fibrosis, wound healing, and metastasis. Among the multiple biochemical changes 
from an epithelial to a mesenchymal phenotype, the alteration of cellular dynamics 
in cell-cell as well as cell-substrate contacts is crucial. It was shown that, in the 
very first hours of the transition, the cell-substrate distance increases by several tens 
on nanometers, but later in the process after reaching the mesenchymal state, this 
distance is reduced again to the level of untreated cells. 

Dual-color MIET allowed for reconstructing the 3D profile of the nuclear envelope 
over the whole basal area of HeLa cells [15]. The profilometry was done by measuring 
the axial distance between the proteins Lap2( and Nup358 as components of the 
nuclear envelope and the nuclear pore complex, with defined localizations at the 
inner nuclear membrane and the cytoplasmic side of the protein complex, respectively 
(Fig. 8.8). The obtained thickness of the nuclear envelope of 30-35 nm is in very good 
agreement with the values that were obtained using electron microscopy. This study 
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Fig. 8.7 Average cell membrane-substrate distance of untreated (blue) and TGF-ß1 treated 
NMuMG cells (red) over time. NMuMG cells detach from the surface by more than 20 nm on aver- 
age in response to TGF-31 administration. After 20h the initial cell-substrate distance is restored. 
The standard error of mean is illustrated as colored area around the data points 
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Fig. 8.8 Schematic of the positions of Lap2( and Nup358 in the inner nuclear membrane and the 
nuclear pore complex, respectively. HeLa cells were fixed and subjected to indirect immunofluo- 
rescence using goat anti-Nup358. Three-dimensional height profiles of the inner (top) and outer 
(bottom) nuclear membrane of a typical HeLa cell nucleus, as determined by MIET imaging. The 
outer nuclear membrane roughly follows the profile of the inner nuclear membrane 
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has shown that optical microscopy allows one not only to measure the distance 
between the outer and inner nuclear membrane but also to reconstruct its 3D profile 
over the whole basal area. 

Recently, dual-color MIET was combined with Forster resonance energy transfer 
(FRET) for studies of cytoskeletal elements and adhesions in human mesenchymal 
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Fig. 8.9 3D architecture of stress fibers at focal adhesions changes from 12 to 24 h. Height profiles 
along actin filaments and vinculin complexes after 12 and 24 h. Images a and b correspond to 
intensity-weighted ensemble heights of actin and vinculin, respectively, for a cell fixed 12h after 
seeding. Images d and e correspond to intensity-weighted ensemble heights of actin and vinculin, 
respectively, for a cell fixed 24h after seeding. White points (1), (2), and (3) on the intensity- 
weighted height images indicate the starting points of the height profiles shown in images (c) and 
(f). They show the height of actin filaments (circles) and vinculin clusters (triangles) at the same 
focal adhesion. The shaded areas mark the 1o-regions of the height values. Scale bar is 10 1m 


stem cells [16]. In addition to resolving nanometric structural details along the z-axis 
using MIET, FRET was used to measure the distance between actin and vinculin at 
focal adhesions. The analysis of the temporal evolution of actin heights shows that 
the actin filaments move closer to the surface while the cell is spreading and firmly 
adhering (Fig. 8.9). Although the fibers are distributed over a broad height range 
during an early phase (1-6 h), their distance to the surface reduces around 12h and 
later time points to 40 nm. On the other hand, during maturation of focal adhesion 
complexes, vinculin aggregates grow larger as indicated by an increase in height, and 
the mean height of the actin bundles above the surface is decreasing. The nanometer- 
precise height information along the fibers and of the vinculin clusters Fig. 8.9 gives 
a detailed picture of stress fibers anchoring at focal adhesions and spanning the cell 
at a slight inclination of below 1°. 

Use of single photon counting detectors for MIET measurements allow one not 
only to achieve nanometer resolution of sub-cellular structures with high labeling 
density, but also to do nanometer axial localization of single molecules. The proof 
of principle study was done by Karedla et al., where the authors determined the 
height of dye molecules deposited on a dielectric spacer of a known thickness [17]. 
By varying the thickness of the spacer, the authors showed that the axial position 
of molecules can be determined with accuracy better than 2.5nm. The excellent 
agreement between the known thickness and the height values that were obtained 
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using MIET showed its applicability for single molecule studies with accuracy that 
unachievable with conventional microscopy techniques. 

Isbaner et al. used MIET for colocalizing two single fluorescent emitters along 
the optical axis with nanometer accuracy [18]. For this purpose, the authors used 
stepwise photobleaching to find the fluorescence lifetime values of each emitter on 
one DNA origami pillar, which allowed them to determine their individual heights 
from the surface and thus their mutual axial distance. The determined distance of 
32 + 11 nm is in excellent agreement with the design value of 32 nm. 


8.5 Conclusions 


The review of applications of MIET shows its versatility and potential for numerous 
application in live cell imaging and single molecule localization. The unique com- 
bination of its technical simplicity and nanometer axial resolution makes it widely 
applicable for numerous studies in life science or nanotechnology. The distance range 
covered by MIET nicely bridges (and complements) the realm of conventional FRET 
and all the recently developed super-resolution imaging techniques. It opens new per- 
spectives for nanometer localization of sub-cellular structures in cell focal adhesion 
complexes. Since MIET keeps all the key advantages of conventional fluorescence 
microscopy, it allows to do simultaneous multi-color imaging of various sub-cellular 
structures. Extremely high photo-sensitivity of single photon avalanche diodes that 
are used as photo-detectors for MIET imaging allow one to singe molecule local- 
ization with precision that in unimaginable for conventional optical microscopy. We 
envision further rapid growth of the number of its applications and technical devel- 
opment for increasing its temporal and spatial resolution. 
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Chapter 9 A) 
Reversibly Switchable Fluorescent get 
Proteins for RESOLFT Nanoscopy 
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Abstract Diffraction-limited lens-based optical microscopy fails to discern fluo- 
rescent features closer than ~200 nm. All super-resolution microscopy (nanoscopy) 
approaches that fundamentally overcome the diffraction barrier rely on fluorophores 
that can adopt different states, typically a fluorescent ‘on-’state and a dark, non- 
fluorescent ‘off-’state. In reversible saturable optical linear fluorescence transitions 
(RESOLFT) nanoscopy, light is applied to induce transitions between two states 
and to switch fluorophores on and off at defined spatial coordinates. RESOLFT 
nanoscopy relies on metastable reversibly switchable fluorophores. Thereby, it is 
particularly suited for live-cell imaging, because it requires relatively low light levels 
to overcome the diffraction barrier. Most implementations of RESOLFT nanoscopy 
utilize reversibly photoswitchable fluorescent proteins (RSFPs), which are deriva- 
tives of proteins from the green fluorescent protein (GFP) family. In recent years, 
analysis of the molecular mechanisms of the switching processes have paved the way 
to a rational design of new RSFPs with superior characteristics for super-resolution 
microscopy. In this chapter, we focus on the newly developed RSFPs, the light-driven 
switching mechanisms and the use of RSFPs for RESOLFT nanoscopy. 


9.1 Overcoming the Diffraction Barrier 


Optical fluorescence microscopy allows to discern protein distributions in cells. Over 
the last century, cell biology experienced countless discoveries based on the micro- 
scopic visualization of (sub-)cellular structures and their dynamics. Eukaryotic cells, 
which generally have a diameter of 10-300 um, are relatively small objects crowded 
with proteins. Hence, conventional optical microscopy, whose resolution is funda- 
mentally limited by diffraction to about 200nm, inescapably faces the fact that the 
finest details are not resolvable. 
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Resolution in microscopy means the separation of distinct features, such as fluo- 
rescently labelled proteins. In conventional diffraction limited microscopy, such as 
widefield (epifluorescence) or confocal laser scanning microscopy, all fluorophores 
in closer proximity to each other than the diffraction limit are excited together, they 
emit together, and their emissions diffract together and therefore they are detected 
together [1]. Hence, with these approaches, structures closer than the diffraction 
limit are inseparable. The key to fundamentally overcome the diffraction barrier is to 
render adjacent molecules discernible for a short period of time, preventing different 
molecules within the same diffraction region from being detected together [2]. This 
separation can be implemented either in a coordinate-targeted or in a coordinate- 
stochastic way (for reviews see [1, 3]). 

In coordinate-targeted super-resolution microscopy techniques such as RESOLFT 
nanoscopy, a light pattern is used to induce transitions between two states and to 
switch fluorophores on and off at defined spatial coordinates. In the simplest scenario, 
a single beam creating one intensity minimum in a doughnut-shaped pattern is used, 
but also approaches with several or many minima are possible. At these minima 
(zeros), there is no off-switching, and the fluorophores in the on-state can fluoresce. 
These light patterns are scanned over the sample, in order to record a full image. 
Such scanning approaches require multiple on-off-cycles of the fluorophores. 


9.2 RSFPs for Live-Cell RESOLFT Nanoscopy 


In its initial definition, the term ‘RESOLFT nanoscopy’ covered all coordinate- 
targeted nanoscopy approaches relying on two distinct (fluorophore) states including 
STED nanoscopy [4]. Later, this term was primarily used for coordinate-targeted 
nanoscopy approaches that rely on metastable fluorophores, most prominently on 
reversibly (photo-)switchable fluorescent proteins (RSFPs). Compared to other 
super-resolution microscopy techniques that overcome the diffraction barrier, 
RESOLFT nanoscopy requires remarkably low light dose to achieve nanoscale res- 
olution. The light intensities used are similar to those applied in live-cell confocal 
fluorescence microscopy and up to six orders of magnitude lower than those in STED- 
microscopy. The total light dose deposited on the sample is lower by 3-4 orders of 
magnitude compared to coordinate-stochastic nanoscopy [5, 6]. As the light inten- 
sity is an important factor that determines phototoxicity [7], RESOLFT nanoscopy 
is particularly suitable for live-cell imaging approaches. 

RESOLFT microscopy relies on fluorophores that can be reversibly photo- 
switched between two metastable states. In the overwhelming majority of appli- 
cations, these have been reversibly photoswitchable fluorescent proteins. RSFPs are 
structurally highly similar to the green fluorescent protein (GFP). All GFP-based flu- 
orescent proteins share the same overall structure of 11 3-sheets, forming a 3-barrel 
with an a-helix running through the center. The chromophore is autocatalytically 
formed out of three amino acids within the a-helix requiring only oxygen as an 
external cofactor for its maturation (Fig. 9.1) [8]. In GFP, the chromophore consists 
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Fig. 9.1 Structure of EGFP and its intrinsic chromophore. a Side view of the GFP barrel. b Top 
view of the GFP barrel with the internal chromophore in its center (PDB: 2Y0G). e Structure of 
the GFP chromophore. The chromophore consists of an imidazolinone and a hydroxyphenyl ring 
connected by a methine bridge 


of ahydroxyphenyl ring and an imidazolinone ring connected by a methine bridge. 
By rotation around the methine bridge, the chromophore can adopt either a cis or 
a trans conformation. The hydroxyphenyl ring can be protonated or deprotonated, 
thereby shifting the absorption spectrum by about 80-120 nm. The (-barrel shields 
the chromophore in its center, while numerous noncovalent bonds to surrounding 
amino acid residues and internal water molecules determine the chromophore posi- 
tion within the protein and its protonation state. The spectral properties of a specific 
RSFP are a result of the chromophore structure as well as of its interactions with the 
surrounding residues; therefore, those positions are key targets for mutagenesis to 
modify the properties of RSFPs. 


9.3 Photoswitching Mechanisms of RSFPs 


Conventional RSFPs can be classified according to their switching mode, i.e. a ‘pos- 
itive’ or a ‘negative’ switching mode (Fig. 9.2) [9]. In negative switching RSFPs, 
the same wavelength that induces fluorescence also switches the RSFP from the 
on- to the off-state. In contrast, in positive switching RSFPs, the light that induces 
fluorescence also transfers the protein from the off- to the on-state. Thereby, in con- 
ventional RSFPs, switching and fluorescence excitation are directly interconnected. 
The mechanistic principles of switching have been initially revealed by spectroscopy 
and crystallography of the RSFPs asFP595 and Dronpa in their respective on- and off- 
states [10, 11]. In recent years, numerous further studies [12-16], including detailed 
molecular dynamics studies, ultrafast spectroscopy and crystallography have led to 
impressive insights into the details of the light driven switching mechanism [17-20]. 
The mechanistic key event of the switching of all conventional RSFPs analyzed so 
far is a light induced cis-trans isomerization of the chromophore often combined 
with a protonation change [13]. This isomerization can be accompanied by shifts of 
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Fig. 9.2 Schematic overview of the different switching mechanisms. a Negative switching RSFPs 
can be switched off with light of the wavelength used for fluorescence excitation. b Positive switch- 
ing RSFPs are switched on by the fluorescence excitation light. c Proteins with decoupled switching 
are excited with light of a different wavelength than used for on- or off-switching. For each switch- 
ing mechanism, schematic chromophore structures in the on- and the off-state, examples for the 
switching wavelengths, and the respective absorption and emission spectra are shown 


the planarity of the chromophore and/or conformational changes of the chromophore 
pocket and modifications of the hydrogen-bonding network around the chromophore, 
influencing the chromophores’ ability to fluoresce. Almost all reported crystallo- 
graphic structures of conventional RSFPs feature a trans-conformation of the chro- 
mophore in the off-state and a cis-conformation in the on-state [21]. However, it 
should be noted, while the majority of fluorescent proteins contain a chromophore 
in the cis-conformation, bright fluorescent proteins containing a chromophore in 
the trans-state exist [22]. The only reported RSFP so far with a different molecular 
switching mechanism is the yellow fluorescent RSFP Dreiklang [23] and its descen- 
dant Spoon [24] (Fig. 9.2c). In Dreiklang, fluorescence excitation is decoupled from 
switching, i.e. one wavelength is used for fluorescence excitation (515 nm), another 
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for switching on (405 nm), and a third for switching off (355nm). In Dreiklang, 
a light-driven reversible hydration/dehydration of the chromophoric imidazolinone 
ring causes switching by reversibly disrupting the 7-conjugated electron system. 

In the following, we will take a closer look at the different switching modes and 
their application in RESOLFT imaging. 


9.3.1 Negative Switching Mode 


Negative switching RSFPs are most commonly used for RESOLFT nanoscopy. In 
the dark state, the chromophore is found in the protonated trans-conformation, while 
in the on-state the chromophore is deprotonated in the cis-conformation (Fig. 9.2a). 
Excitation of the on-state chromophore results in fluorescence (quantum yield typi- 
cally 0.1-0.9), switching (quantum efficiency typically in the range of 0.01) [25], or 
to other non-radiative processes. In comparison, the quantum efficiency for switching 
the protein from the off- to the on-state is generally substantially higher (quantum 
efficiency typically in the range of 0.1). The wavelength used for switching the 
RSFP from the off- into the on-state is usually 80-120nm shorter than the wave- 
length for switching off and fluorescence excitation. The precise sequence of the 
intra-molecular events during the switching process, particularly the sequence of 
isomerization and protonation change, have been controversially discussed [21]. 
Recent results for Dronpa and IrisFP conclusively suggested that during the off-to- 
on transition the chromophore isomerization precedes the protonation change [20, 
26]. 

In single-beam scanning RESOLFT nanoscopy, negative switching RSFPs are 
generally utilized according to the following imaging scheme: First, the proteins are 
switched to the on-state with a Gaussian beam, then the proteins in the periphery 
are switched off with a doughnut-shaped beam, and subsequently the remaining 
fluorescence in the center (whose size is smaller than the diffraction limit) is read 
out. The on-switching process has generally a higher quantum efficiency than the 
off-switching process. Thus, the off-switching step is the most time-consuming step, 
although fast switching proteins have been developed that reduced the dwell time 
strongly [5]. However, a too high quantum efficiency for switching off is also not 
desirable, as switching competes with fluorescence excitation in case of RSFPs with 
a negative switching mode. 


9.3.2 Positive Switching Mode 


All reported RSFPs with positive switching characteristics, including Padron [9], 
rsCherry [27] or asFP595 [28], share an Anthozoa heritage. In the off-state, the 
chromophore of the positive switching RSFPs is generally in the trans-state and 
deprotonated (Fig.9.2b). Excitation into the absorption band of the deprotonated 
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chromophore induces chromophore isomerization to the cis-state. In the cis-state, 
the protein is in an equilibrium between the protonated and the deprotonated state. 
Excitation of the protonated cis-chromophore switches the protein to the trans off- 
state, while excitation of the deprotonated cis-chromophore excites fluorescence. 
The protonation equilibrium of the chromophore in the cis- and the trans-states 
is immediately influenced by the amino acids forming the chromophore pocket. 
Reportedly, the exchange of few amino acids can induce a change in the switching 
mode of the respective fluorescent protein [9, 12, 27]. 

Using positive switching RSFPs, the optical system for beam scanning RESOLFT 
nanoscopy can be reduced to a Gaussian beam for switching on and fluorescence 
excitation and a doughnut-shaped beam for switching the protein off [29]. A potential 
advantage of RESOLFT imaging with positive switching fluorescent proteins is that 
the number of emitted photons during readout is not limited by a competing off- 
switching process. 


9.3.3 Decoupled Switching Mode 


The switching of Dreiklang and its descendants differs from the switching of con- 
ventional RSFPs by the fact that the excitation of fluorescence does not induce sub- 
stantial switching [23]. The chromophore of the on-state protein is in an equilibrium 
between a protonated and a deprotonated state (Fig. 9.2c). Excitation of the depro- 
tonated on-state chromophore at 515 nm results in fluorescence, while excitation of 
the protonated chromophore at 405 nm leads to a hydration of the C65 atom of the 
imidazolinone ring, thereby switching the protein to the off-state. More specifically, 
upon excitation into the absorption band of the protonated chromophore, an ultrafast 
excited state proton transfer occurs, presumably leading to a charge transfer to the 
imidizanolinone ring, which subsequently is protonated by Glu222, catalyzing the 
addition of a water molecule [30]. By this water addition, the 7-conjugated electron 
system is shortened, resulting in the emergence of an absorption band at 395 nm. 
Excitation of this band results in a water elimination reaction, thereby converting 
the protein back to the on-state. The positioning of the water molecule reacting with 
the chromophore and the protonation state of the chromophore are crucial for the 
efficiency of this switching mechanism. This process is dependent on at least three 
key amino acids (Gly65, Tyr203, Glu222) [23]. 

The decoupling of switching from fluorescence excitation enables full control of 
the state of the protein, which is desirable in RESOLFT nanoscopy. Dreiklang has 
been used for RESOLFT nanoscopy [23, 31], although it is outperformed by other 
RSFPs, because some of its other properties (see below) are less favorable for this 
imaging modality. 
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9.4 RSFP Properties Important for RESOLFT Nanoscopy 


The usefulness of a specific RSFP for live-cell RESOLFT nanoscopy depends on 
several factors. On the one hand, any usable RSFP must be a suitable fusion tag 
with a negligible dimerization tendency and a fast maturation rate. On the other 
hand, it needs to exhibit a combination of favorable photophysical characteristics. 
The four most important parameters for RESOLFT imaging are the brightness of the 
protein in its on-state, its switching speed, the residual fluorescence background in 
the ensemble off-state, and the switching fatigue (Fig. 9.3). These parameters, all of 
which can be influenced by mutagenesis, are discussed in detail in the following. 


9.4.1 Brightness 


For a conventional fluorophore, the molecular brightness is defined as the product 
of the extinction coefficient and the quantum yield. RSFPs can adopt two different 
states with different quantum yields and in most cases the two states exhibit distinct 
absorption spectra. Generally, the molecular brightness of the off-state is very small 
but not necessarily zero. Next to the molecular brightness of the protein in the on- 
state, two other parameters, namely the effective brightness (of an ensemble of fully 
matured RSFPs in solution) and the effective cellular brightness (of RSFPs in a living 
cell) are used to describe and compare RSFPs. In a switching curve (Fig. 9.3a), the 
effective brightness is measured as the fluorescence intensity of the protein ensemble 
when switched fully into the on-state. In absolute terms, it is the average molecular 


readout/off 


norm. fluorescence 


0.0- i : : i 


0 1 2 3 4 5 
time 


Fig. 9.3 Important switching parameters. Exemplary switching curve of negative switching RSFPs 
when switched consecutively on and off with light of different wavelengths. Plotted is the emitted 
fluorescence intensity. Four key parameters can be determined from these switching curves. a 
effective brightness of the RSFP in the on-state. b Switching speed. c Residual fluorescence in the 
off-state. d Switching fatigue 
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brightness of the on-switched protein solution. As a RSFP solution may adopt an 
equilibrium between the on- and the off-state [9, 27], the effective brightness can 
differ drastically from the brightness at equilibrium. 

The effective cellular brightness of a specific RSFP is influenced by a number of 
factors, such as its maturation time, its turnover-rate, its expression rate, its usability 
as a tag, temperature, local pH and other difficult to pinpoint factors. As RSFPs are 
primarily utilized for in vivo imaging, the effective cellular brightness is a very crucial 
parameter when RSFPs are compared. The cellular brightness can vary strongly 
between different experimental settings [32, 33] and therefore the data from different 
studies are difficult to compare. 


9.4.2 Ensemble Switching Speed 


The switching speed of an ensemble of RSFPs is typically in the microsecond to sec- 
ond range, depending on the light intensity applied [5, 25, 34, 35]. Since RESOLFT 
is an ensemble method, the ensemble switching speed is a key parameter to describe 
and compare RSFPs. In a switching curve (Fig.9.3), this parameter describes the 
time needed to switch the ensemble from the maximal fluorescence intensity to the 
minimal fluorescence intensity (from the on- to the off-state) or vice versa (Fig. 9.3b). 
The speed is dependent on the applied light intensities and wavelengths; applying 
higher light intensities results in faster switching. As switching is a one photon pro- 
cess in both directions [36, 37], the speed and the applied light intensity for the 
switching are largely linearly correlated until saturation of the process sets in. The 
ensemble switching speed is determined primarily by the quantum efficiency for 
switching, by (often ill-defined) intermediate states, and by a crosstalk between the 
on- and the off-switching processes. While high quantum efficiencies for switching 
are beneficial for RESOLFT imaging by shortening the time consuming switching 
step, the connection between switching and fluorescence readout limits the number 
of collected photons in a single switching cycle. For point-scanning RESOLFT in 
living cells, the switching times are generally below | ms [5, 6], while in parallelized 
RESOLFT schemes, switching times of tens of milliseconds are suitable [34, 38]. 


9.4.3 Residual Fluorescence in the Off-State 


The residual fluorescence in the off-state describes the percentage of the fluores- 
cence signal from the off-state compared to the fluorescence signal of the on-state 
(Fig. 9.3c). The switching contrast is defined as the ratio between the fluorescence 
signal of the on-state and the fluorescence signal of the off-state measured on an 
ensemble of proteins. 
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The switching contrast is determined by the crosstalk between on- and off- 
switching and the molecular brightness of the on- and the off-state. Furthermore, 
fast thermal relaxation of the switched protein to the respective equilibrium state, 
as well as the population of intermediate states, can affect the reachable switching 
contrast. RSFPs useful for RESOLFT nanoscopy exhibit a contrast higher than 10 (a 
residual fluorescence below 10%), although smaller values of the residual fluores- 
cence have been reported and are beneficial [25, 39]. 


9.4.4 Switching Fatigue 


With every full switching cycle of an ensemble of RSFPs, a fraction of the proteins is 
destroyed. This fraction is generally described as the switching fatigue (given as per- 
centage of the effective brightness) (Fig. 9.3d). Presumably, the switching fatigue is 
mechanistically related to the photostability of a fluorescent protein, which describes 
its stability while the protein is maintained and excited in the fluorescent on-state. 

A low switching fatigue is critical for the usability of RSFPs in RESOLFT 
microscopy [40]. Photobleaching is highly dependent on the light intensity, as dif- 
ferent intensity regimes of photobleaching and nonlinear effects of increasing light 
intensities have been reported [32, 41]. Therefore, lower light intensities typically 
result in reduced switching fatigue [39]. Very photostable RSFPs can be switched 
thousands of times before they are bleached to 50% of their initial brightness [5, 42]. 

Brightness, switching speed, contrast and switching fatigue are strongly inter- 
twined. The introduction of switching into a conventional fluorescent protein as well 
as the increase of the switching speed of an existing RSFP by mutagenesis generally 
lead to a reduction in the fluorescence quantum yield [5, 39, 43, 44]. Furthermore, 
for negative switching RSFPs a correlation of the switching fatigue with the off- 
switching speed has been reported [25]. 


9.5 Overview of RSFPs for RESOLFT Nanoscopy 


In the last decade, a number of new RSFPs have been engineered using semi-rational 
design strategies guided by the crystal structures and insights into the switching 
mechanism. Thereby, either conventional fluorescent proteins were made switch- 
able, or the characteristics of existing RSFPs were modified. The members of this 
growing family of RSFPs have been used for a number of applications [45], includ- 
ing intracellular protein tracking [46], photochromic FRET [47], optogenetics [48], 
optoacoustics [49], and several super-resolution microscopy techniques including 
coordinate-stochastic approaches [9], SOFI [50, 51], (protected) STED [52, 53] and 
RESOLFT/NL-SIM [39, 42]. 

For each application, distinct RSFP properties are crucial and RSFPs have been 
optimized and adapted for the specific demands. In the following, we provide an 
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Fig. 9.4 Fluorescence emission wavelength versus molecular brightness of RSFPs. Displayed 
are the molecular brightness of the on-state (normalized to EGFP) and the fluorescence emission 
wavelength maximum of selected RSFPs. The selection is not comprehensive. Note that only RSFPs 
with a (potential) usability for nanoscopy are presented 


overview on the published RSFPs emitting in the green, yellow and red regimes of 
the visible spectrum, which were used in RESOLFT microscopy or have properties 
beneficial for RESOLFT imaging (Fig. 9.4, Table 9.1). 


9.5.1 RSFPs Emitting in the Green 


RSFPs emitting in the green have been engineered based on several well-characterized 
fluorescent proteins isolated from Hydrozoan and Anthozoan species. The switching 
wavelengths of these green fluorescent RSFPs are at ~488 and ~405 nm, correspond- 
ing to the absorption bands of the deprotonated and the protonated chromophore. The 
first experimental implementation of live-cell RESOLFT nanoscopy utilized the neg- 
ative switching rsEGFP [42], which has been tailored for RESOLFT nanoscopy based 
on the Hydrozoan EGFP [60]. Derived from rsEGFP, a number of other RSFPs were 
engineered such as families of rsEGFPs [5, 56], rsGreens [25], and rsFolders [14], 
with further optimized expression or modified switching characteristics for specific 
applications. 

The EGFP derived RSFPs show very good tagging capabilities, are true monomers, 
and exhibit fast to very fast switching kinetics. Especially the high resistance of 
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Table 9.1 Properties of RSFPs for RESOLFT nanoscopy 
Protein Chr. | Aex | Aem | Switching Switching nu | E References 
(aa) | (nm)| (nm)| mode wavelength (nm) (103 /(M.cm)) 
GmarsQ QYG Negative 0.64 | 3 [54] 
GmarsT TYG Negative 0.53 | 55 [35] 
rsEGFP2 AYG Negative 0.3 61 [5] 
rsFolder AYG Negative 0.25 | 52 [14] 
rsFolder2 AYG Negative 0.23 | 44 [14] 
NijiFP (green) | HYG Negative 0.64 | 41 [55] 
rsGreenl TYG Negative 0.42 58 [253] 
rsEGFP N205S | TYG Negative 0.45 | 57 [56] 
rsEGFP TYG Negative 0.36 | 47 [42] 
Skylan-S SYG Negative 0.64 | 152 [51] 
Skylan-NS LYG Negative 0.59 | 134 [39] 
Kohinoor CYG Positive 0.71 63 [29] 
mGeos-M MYG Negative 0.85 | 52 [57] 
Dronpa M159T | CYG Negative 0.23 62 [44] 
mlrisFP (green)| HYG Negative 0.54 | 47 [58] 
rsFastLime CYG Negative 0.77 39 [44] 
Dronpa CYG Negative 0.85 | 95 [37] 
Padron CYG Positive 0.64 | 43 [9] 
Spoon GYG Decoupled 0.5 54 [24] 
Dreiklang GYG Decoupled 0.41 83 [23] 
NijiFP (red) HYG Negative 0.65 | 42 [55] 
mlrisFP (red) HYG Negative 0.59 | 33 [58] 
rsTagRFP MYG Negative 0.11 | 37 [47] 
asFP595 MYG Positive n.d. n.d. [28] 
rsFusionRed2 MYG Negative 400-510 0.12 36 [34] 
rsFusionRed3 MYG| 580 | 607 | Negative 400-510 | 592 0.08 | 38 [34] 
rsCherryRev MYG| 572 | 608 | Negative 450 561 0.005 | 84 [27] 
rsCherryRev1.4| MYG| 572 | 609 | Negative 450 592 n.d n.d. [59] 
rsCherry MYG| 572 | 610 | Positive 561 450 | 0.02 | 80 [27] 


rsEGFP2 to switching fatigue at high and low light intensities makes it the most 
widely used RSFP in RESOLFT imaging to date. 

The very bright Anthozoa protein Dronpa, which is based on the oligomeric pro- 
tein 22G from Pectiniidae spec., was the first RSFP to be used in live-cell imaging 
[46]. Dronpa exhibits comparatively slow switching and very high switching fatigue; 
Dronpa’s crystal structures of the on- and off-state were solved early, guiding the 
development of models for the switching mechanisms and strategies for semi-rational 
protein design [10, 44]. Based on Dronpa, several RSFPs with improved switching 
properties including rsFastLime, Dronpa-M159T [44] and Dronpa-3 [43] were devel- 
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oped. The increase in switching speed of these mutants was always accompanied by 
a reduction of their respective molecular brightness; still, Dronpa-M159T is well 
suited for point-scanning RESOLFT [61]. 

A single-residue exchange at the position 159 (Met159Tyr) is sufficient to reverse 
the switching mode of rsFastLime, i.e. this mutation converted the negative switching 
RSFP rsFastLime to the positive switching rsFastLime-M159Y. Additional mutations 
led to the engineering of the bright, positive switching RSFP Padron, exhibiting a high 
switching contrast [9]. Further mutagenesis changed the switching characteristics and 
photostability of Padron and led to the development of Kohinoor [29] opening the 
door to live-cell RESOLFT with positive switching RSFPs. 

In recent years, several RSFPs have been engineered using photoconvertible fluo- 
rescent proteins as scaffolds. Conventional photoconvertible fluorescent proteins are 
irreversibly switched from one emission maximum (generally green) to another one 
(generally red). The mutation Phe173Ser was shown to introduce negative switching 
to the photoconvertible Anthozoa proteins Dendra2 [62] and mEos [63], resulting in 
the multiphotochromic RSFPs NijiFP [55] and (with additional mutations) mIrisFP 
[64]. NijiFP and mIrisFP are both photoconvertible and negatively switchable in the 
red and green form. 

All reported green to red photoconvertible fluorescent proteins have a histidine at 
the first amino acid position of their chromophore, which is essential for their photo- 
convertibility [65]. Exchanging this histidine by another amino acid residue can trans- 
form a photoconvertible protein into a negative switching RSFP. This was reported 
for the Anthozoa protein mEos2 [66]. By exchanging the His for different amino 
acids (Cys, Glu, Phe, Leu, Met, Ser, partially in combination with the Phe173Ser 
mutation), a series of negative switching RSFPs (mGeos-X), with switching prop- 
erties similar to those of various Dronpa variants, was produced [57]. Likewise, the 
His62Leu mutation of the protein mEos3.1 [67] (a true monomer and bright mEos2 
descendant), resulted in the RSFP Skylan-NS [39], which exhibits negative switching 
characteristics with high brightness and excellent switching contrast. 

Following the same strategy, exchange of the chromophoric His in the photocon- 
vertible Anthozoa FP mMaple3 [68], in combination with Met168Ala (corresponding 
to Met159 in Dronpa), resulted in a series of negative switching RSFPs (GMars-X) 
displaying various switching kinetics beneficial for different RESOLFT applications, 
in particular for parallelized RESOLFT [35, 54, 69]. 


9.5.2 RSFPs Emitting in the Yellow 


Based on the yellow fluorescent protein Citrine (a mutant of GFP) [70], Dreiklang 
[23] and its descendant Spoon [24] were developed. As detailed above, in Dreiklang 
the switching is decoupled from fluorescence readout and this RSFP has been used 
for single-beam and parallelized RESOLFT imaging [23, 31, 38]. A limitation of 
Dreiklang is the comparatively low switching speed, the requirement for two UV 
light wavelengths for switching and its pronounced switching fatigue. However, in 
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principle, the decoupled switching mechanism should be very beneficial for most 
RESOLFT applications. Remarkably, no conventional RSFPs have been reported 
based on a yellow fluorescent protein so far. 


9.5.3 RSFPs Emitting in the Red 


Even though the positive switching red RSFP asFP595 was the first RSFP used for 
proof of concept RESOLFT imaging with purified protein solutions [71], its obligate 
tetramerization and poor performance as a fusion tag hindered its use in live-cell 
imaging. 

Derived from the Anthozoa protein mCherry, the RSFPs rsCherry, rsCherryRev 
[27] and rsCherryRev1.4 [59] were established. The high photostability and fast 
switching of rsCherryRev1.4 facilitated point-scanning and parallelized live-cell 
RESOLFT nanoscopy with this red emitting RSFP [38, 59], but its poor expres- 
sion in mammalian cells and a tendency for dimerization limit its applicability. 

Based on TagRFP [72], the red emitting RSFP rsTagRFP [47] was developed. It 
shows good contrast and strong changes in the absorption spectra of the on- and off- 
states. However, a low extinction coefficient and high switching fatigue prevented 
RESOLFT imaging in living cells. Only recently, a new family of red fluorescent 
RSFPs based on FusionRed was engineered [34]. The parent protein shows excellent 
tagging performance and rsFusionRed?2 as well as rsFusionRed3 were successfully 
used in a parallelized RESOLFT approach [34]. 

In conclusion, a growing family of RSFPs for RESOLFT nanoscopy is available, 
each member having distinct properties. As many properties of RSFPs are depending 
on the light intensities used and the (cellular) environment, data recorded in different 
laboratories may be difficult to compare. Currently, outside the green fluorescence 
spectral regime, only a few RSFPs are available, underscoring the need for further 
protein engineering of red and infrared RSFPs. 


9.6 Applications of RESOLFT Nanoscopy 


Since the first proof of concept demonstration of RESOLFT nanoscopy using puri- 
fied asFP595 [71], numerous RESOLFT live-cell applications have been reported 
[45]. It has been used in point-scanning (Fig.9.5a) [5, 6, 14, 23, 25, 42, 73] as 
well as in various parallelized implementations (Fig. 9.5b) [34, 35, 38, 54, 56, 69, 
74, 75]. 

Arguably, rsEGFP2 is the most suited protein in the point-scanning mode, as it 
displays a combination of favorable properties, most prominently its strong resis- 
tance against switching fatigue. Amongst others, rsEGFP2 has been used for imag- 
ing of fixed cells that had been labeled by a primary antibody against a protein 
of interest that was detected by a fusion protein consisting of the IgG binding Z- 
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RESOLFT 


Fig. 9.5 Examples of RESOLFT imaging of living cells. a Point-scanning RESOLFT microscopy 
of intermediate filaments. Keratin19-rsEGFP2 expressed in living Ptk2 cells imaged in the confocal 
(left) and point-scanning RESOLFT mode (middle) [5]. The graphs show fluorescence intensity line 
profiles taken at sites indicated in the magnification (right). b Parallelized RESOLFT microscopy 
of intermediate filaments. Keratin19-rsEGFP-N205S expressed in living Ptk2 cells [56]. c Two 
color RESOLFT imaging. Vimentin-rsCherryRev1.4 and Keratin-rsEGFP2 were imaged in HeLa 
cells [59]. d Parallelized RESOLFT microscopy using a red RSFP. Living U2OS cells express- 
ing rsFusionRed3-Vimentin [34]. e In vivo RESOLFT microscopy. Time-lapse point-scanning 
RESOLFT imaging of rsEGFP2-Tubulin facilitates the observation of tubulin dynamics in intact 
living fly larvae [6]. Images are adapted with permissions from the listed references. Scale bars: 
a lum; b 10m; ce 0.25 um; d 5m; e 1 um 


domain of protein A and rsEGFP2 [73]. Live-cell RESOLFT imaging was established 
on CRISPR/Cas9 edited cell lines expressing rsEGFP2 fusion proteins at endoge- 
nous levels [76]. Likewise, rsEGFP2 was used for RESOLFT nanoscopy in living 
Drosophila melanogaster tissues and even in intact transgenic second instar fly lar- 
vae [6] (Fig. 9.5e). Using rsEGFP2, tens of images with a resolution below 50 nm 
can be taken for small fields of view with a frame rate of a few seconds. To improve 
specific characteristics, several variants such as rsGreen and rsFolder were gener- 
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ated and used for RESOLFT nanoscopy [14, 25]. The fast switching Dronpa variant 
Dronpa-M159T has similar switching kinetics as rsEGFP2, and it has also been used 
in a few RESOLFT applications [61, 77]. Next to these negative-switching RSFPs, 
the positive-switching RSFP Kohinoor [29] and the decoupled RSFP Dreiklang were 
used for cellular RESOLFT imaging [23, 31, 38]. 

Several strategies have been applied to perform dual color live-cell RESOLFT 
nanoscopy, including an approach that exploited the different switching speeds and 
fluorescence lifetimes of two RSFPs [78]. Despite the fact that the red fluorescent 
proteins are outperformed by the green ones, some studies utilized a red fluorescent 
RSFP in combination with a green [59] or yellow fluorescent RSFP [38] for dual 
label RESOLFT imaging (Fig. 9.5c). 

Because of the rather long dwell times, single beam scanning RESOLFT 
nanoscopy of large fields of view is rather slow and recording of a single image 
may take several tens of seconds. To address this limitation, several parallelization 
strategies have been successfully implemented. The first realization of parallelized 
RESOLFT relied on incoherently superimposed orthogonal standing light waves that 
allowed to image whole cells in a few seconds [56], and other strategies were reported 
subsequently (Fig. 9.5b, d) [38, 74, 79]. RESOLFT nanoscopy was also implemented 
in a light sheet approach [80] and has been combined with STED nanoscopy [52]. 


9.6.1 Other Fluorophores for RESOLFT Nanoscopy 


Most current realizations of RESOLFT imaging used RSFPs from the GFP fam- 
ily. Only recently, several other photochromic fluorophores have been utilized for 
RESOLFT-type nanoscopy. This includes rsLov1, an engineered reversibly switch- 
able protein based on the bacterial photoreceptor YtvA from Bacillus subtilis that 
binds flavin mononucleotide as a chromophore [81]. Also, organic fluorophores were 
successfully used, including carboxylated photoswitchable diarylethenes [82] and 
Cy3-Alexa647 heterodimers [83]. Currently, however, these novel probes are out- 
performed by RSFPs based on the GFP backbone, with regard to basically all pho- 
tophysical parameters important for live-cell RESOLFT nanoscopy. 


9.7 Outlook 


RESOLFT nanoscopy opens up very flexible imaging schemes. For example, the 
attained resolution can be adjusted so that the resolution is decreased in favor of 
reduced phototoxicity, or of an increased recording speed, or in favor of the number 
of images taken before bleaching. This flexibility, in combination with the relatively 
low light intensities required, makes RESOLFT nanoscopy an exciting option for 
live-cell applications. 
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Recent developments enabling a parallelization of the method allowed to over- 
come the rather low imaging speed of single-beam solutions. Still, the potential of 
RESOLFT nanoscopy has not been fully exploited as yet. This is mainly due to the 
lack of suitable RESOLFT probes emitting in the red and infrared spectral regime, 
which hinders the development of robust multi-color RESOLFT applications. We 
predict that this gap will be closed in the near future, as a number of promising 
templates for the development of such probes are emerging [84]. 
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Chapter 10 A) 
A Statistical and Biophysical Toolbox get 
to Elucidate Structure and Formation 

of Stress Fibers 


Benjamin Eltzner, Lara Hauke, Stephan Huckemann, Florian Rehfeldt 
and Carina Wollnik 


Abstract We are concerned with statistically validated early mechanically guided 
differentiation of human mesenchymal stem cells (hMSCs). This chapter reviews 
and extends methods of fixed and live imaging of hMSCs, automated reliable and 
unbiased near real-time filament extraction and digitization for massive data via 
the FilamentSensor, suitable aggregation of simple (area, mean orientation, aspect 
ratio and order parameter) and advanced (orientation mode persistence and orienta- 
tion fields) data descriptors and methods of their non-euclidean inferential statistics. 
Exemplary, we study the morphology of stress fibers in fixed and live hMSCs within 
24 h post seeding on elastic matrices exhibiting Young’s moduli of 1 kPa (soft, 
brain-like elasticity), 11 kPa (intermediate, muscle-like stiffness) and 30 kPa (hard, 
pre-calcified bone rigidity). The combination of these methods constitutes a novel 
integrated toolbox, where for instance, statistical insight may be used to guide exper- 
imental design. 
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10.1 Introduction 


During the last two decades it has become evident that the mechanical properties 
of the cellular micro-environment are as important for cellular behavior and home- 
ostasis as traditionally investigated biochemical cues [1, 2]. Especially striking was 
the finding that differentiation of human mesenchymal stem cells (hMSCs) can be 
mechanically induced by culturing them on elastic substrates of different Young’s 
moduli E [3]. While upregulation of specific differentiation markers is typically 
observed after five or more days, fundamental mechanical interactions between cells 
and the substrates take place immediately after adhesion on the substrate. Interest- 
ingly, during this early stage (within the first 24 h) of this mechano-guided differen- 
tiation process in hMSCs, the structure and polarization of actin-myosin stress fibers 
as quantified by an order parameter S depend critically on Young’s modulus E [4]. 

Stress fibers are contractile structures mainly composed of actin filaments, myosin 
motor mini filaments (in particular non-muscle myosin I isoforms) and distinct types 
of actin cross-linking proteins (e.g. a-actinin, fascin, etc.). They play the role of 
‘cellular muscles’ generating contractile forces and connecting to the extracellular 
matrix (ECM) via focal adhesions, thereby also transmitting forces to the ECM [5]. 
Acto-myosin filaments are also considered to be the principal force sensors of the 
cell that translate mechanical cues from the surroundings into biochemical signaling, 
eventually leading to cell differentiation [2, 6]. Previous experiments with fixed cells 
revealed the important role of acto-myosin cytoskeleton structure formation for the 
mechanically induced differentiation of hMSCs [4, 7]. 

Building statistically validated models and theories linking substrate elasticity to 
early hMSCs differentiation, the filament structure has to be visualized over time, 
binarized in an unbiased fashion, aggregated into descriptors and analyzed, possibly 
within a feedback loop. Due to high biological diversity, large amounts of data 
are required for statistical power. In turn, such massive data require near real-time 
processing. 

In order to visualize these filaments selectively, fluorescence microscopy proves 
useful and typical images of acto-myosin stress fibers of different quality in fixed 
cells are displayed in Fig. 10.1. One of the main differences, however, of experimental 
visualization of the acto-myosin cytoskeleton in fixed cells at particular time points 
and life cell imaging is given by the signal to noise ratio of the microscope images. 
Fixed cells can be stained with many different methods and allow for saturation with 
fluorescent dyes that typically lead to nice and crisp images (see top row of Fig. 10.1). 
In contrast, life cell imaging, as detailed in Sect. 10.2, relies on transfection with 
fluorescent fusion proteins that need to be expressed in the cell and typically leads 
to worse signal to noise ratios (see bottom row of Fig. 10.1), that are challenging for 
subsequent image processing. 

Once these images are obtained, the challenge consists in extracting the underlying 
filament structure in near real-time in an automated and unbiased fashion. To this end, 
we review the FilamentSensor (FS) from [8] in Sect. 10.3. It integrates general and 
specifically tailored preprocessing with an elaborate binarization routine, to identify 
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Fig. 10.1 Varying quality (top fixed, bottom live) filament expressions of fluorescence micro- 
graphs of human mesenchymal stem cells with scale bars at 50 um. Subfigures: a good quality 
image of a fluorescently stained fixed cell of large size with clearly visible stress fibers on a substrate 
with a Young’s modulus E = 10 kPa; b medium quality image of a fluorescently stained fixed cell 
of moderate size with inhomogeneous brightness and slight blur on glass; ¢ poor quality image of 
a live cell of moderate size with considerable noise and excessive brightness due to overexposure 
on glass; d very poor quality image of a live cell of moderate size with very low contrast due to 
bleaching, considerable blur and hardly discernible stress fibers 


filaments of varying widths, lengths and angles. We describe a novel algorithm able 
to detect slightly bent filaments. Since the FS is modular and open source, it can 
be easily extended to suit related image analysis tasks. Guaranteeing unbiasedness, 
tunable parameters can be learned, among others, from the benchmark data set (BDS) 
in [8] that has been manually labeled by specialists. 

From the binarized filament structure various morphological descriptors can be 
extracted. Simple summary statistics are (weighted) mean orientation, area, aspect 
ratios of principal components (PCs). More subtle is the order parameter derived 
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from the angle between extrinsic mean orientation and first PC, quantifying the 
anisotropy of the acto-myosin cytoskeleton, and [4] linked statistically significant 
substrate elasticity to stem cell differentiation. A more sophisticated analysis [9] 
links number and persistence (under smoothing) of modes (i.e. dominating directions) 
of the distribution of weighted filament orientations to substrate elasticity, and, in 
particular, this requires development of a causal circular scale space theory in [9]. 

The above descriptors coarsely describe common filament orientations. In a 
finer approach, the concept of different single orientation fields (OFs) described 
in Sect. 10.4, pays credit to the tendency of filament orientations to change in a spa- 
tially smooth way. In order to simultaneously analyze statistically different moments 
of their distribution (e.g. jointly the intrinsic mean on a first geodesics PC together 
with the first geodesic PC), backward nested descriptor (BND) analysis is applied 
in Sect. 10.4.2 along with its asymptotic theory from [10]. This allows to elucidate 
fundamental differences between fixed and live cell analysis, with consequences for 
experimental design. 

In Sect. 10.5 we conclude with an outlook how our biostatistical toolbox can 
be used, in various combinations, to tackle problems that have arisen through this 
research and problems currently high interest, for example tracking of individual 
filament dynamics and defining and analyzing corresponding descriptors such as 
filament life times. 


10.2 Live Cell Imaging-Opportunities and Challenges 


As mentioned above, novel insights into mechanisms of the complex mechanical 
interplay between cells and the extracellular matrix require the analysis of the dynam- 
ics of stress fiber formation and arrangement. Such an experimental approach differs 
fundamentally from fluorescence microscopy of fixed cells. In experiments using 
chemically fixed cells, these can be stained with a variety of fluorescent dyes using 
either antibodies or other small molecules that selectively bind to the protein of 
choice. Living cells however need to be modified genetically to express a fluorescent 
protein fused to the protein of interest or a respective binding partner. While both 
methods allow for fluorescently labeled cellular structures a significant difference 
is the homogeneity of the fluorescent intensity: fixed cells will overall show similar 
intensities that depend on the staining method and the cellular concentration of the 
protein of choice; living cells that are transiently expressing the fluorescent marker 
will show a broad distribution of intensities (and even dark cells that do not express 
any fluorescent marker) due to the stochastic nature of the transfection process. In 
addition, it is essential to monitor many (N > 30) cells in parallel leading to an 
enormous amount of microscopic images. These issues pose distinct challenges to 
the image segmentation and analysis algorithm that could not be resolved with our 
traditional approach of a pixel-based eLoG orientation analysis [4] mainly due the 
varying image quality (as illustrated e.g. in Fig. 10.1) and also computational time. 
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In the case of fixed cells, we are using a rigorous protocol to ensure the unbi- 
ased microscopic analysis of single cells. Due to the intrinsic variation of cellular 
morphologies it is of paramount importance to exclude any human bias. First, cells 
are searched in the fluorescence channel of the nucleus dye to find an isolated cells, 
whose nucleus looks normal and has no direct neighbors. Here, itis critical to exclude 
deformed nuclei, nuclei with a doubled set of DNA (high intensity), or any nuclei 
that are within the division process. Next, the fluorescent channel of interest (e.g. 
actin, non-muscle myosin IIa) is recorded regardless of the morphology, except for 
cases where now neighboring cells are observed that might interfere. 

In contrast to the above described protocol for fixed cells, the situation is more 
subtle for parallel live cell imaging. Firstly, cells need to be transfected with a fluo- 
rescent marker that tags the protein of choice (in our case actin). This can be done 
using a fluorescent fusion protein (e.g. GFP-actin) with the immediate drawback of 
over-expression of that protein, differences in assembly kinetics, and potential prob- 
lems with incorporation in distinct actin structures [11]. Most of these issues can 
be avoided using LifeAct [12], a short amino-acid sequence that binds to actin and 
is fused with a fluorescent protein. However, direct comparison also leads to minor 
differences between this visualization and staining later fixed cells with a phalloidin 
dye [11]. 

To avoid influence from neighboring untransfected cells, it is advisable to also 
record always an image in phase contrast or brightfield mode. That way any unwanted 
additional interactions can be ruled out. However, during time lapse microscopy 
recordings several incidents can occur that might affect the statistical analysis of the 
cell population. Cells might migrate out of the field-of-view, an aspect that we will 
address by smart repositioning the sample with real-time analysis of the microscopic 
pictures. Therefore, the full recording will lack a subset of very motile cells. Cells 
might undergo apoptosis and exclude themselves from further analysis reducing 
the number of samples. Cells might divide and therefore make the analysis of their 
cytoskeletal dynamics very complicated, even precluding it in case of not thoroughly 
separating cells. Cells might interact during the time period with neighboring cells 
that will also affect their cell-matrix mechanical interplay and acto-myosin structure. 
Altogether, it becomes clear that the population subsets of microscopy analysis of 
fixed cells and living cells can differ significantly and appropriate measures and 
controls need to be developed to fully understand its impact on the statistical analysis. 


10.3 Automated Unbiased Binarization of Filament 
Structure 


The present section is heavily based on the authors’ previous publication [8], which 
is published under an open license (CC BY 4.0). The text and contents from said 
publication is reproduced here to achieve a self-contained description of the Fila- 
mentSensor. 
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10.3.1 Related Work 


There is an impressive body of techniques for image processing and in particular, for 
line detection (for an overview e.g. [13], Chap. 4). Previous to our development of the 
FS, however, methods for the detection of filaments in cell images were often ad hoc, 
required manual processing to a considerable extent and were computationally rather 
time consuming, e.g. [4, 14, 15]. The latter two issues are particularly unfavorable 
considering the large number of images to be processed in live cell imaging. 

Moreover, there is a large number of algorithms focusing on analyzing networks 
of strongly curved microtubuli (this property is not shared by single filaments), such 
as line thinning by [16], active contours by [17] and the constrained inverse diffusion 
(CID) method by [14]. These methods, however, detect only a skeletal filament 
network structure, they leave out filament orientation, length, and width. They aim 
at identifying thin microfilaments and not wide stress fibers as we are interested in. 

There are methods which aim at extraction not only of filament pixel position 
but also of local orientation such as the FiberScore algorithm by [18], elongated 
Laplacians of Gaussians (eLoGs) by [4] and gradient based methods, e.g. [15, 19]. 

The eLoG method, like the gradient method is geared towards the detection not 
only of filament pixels but also of their orientations. Although filament width and 
length are not extracted by these methods, counting the number of pixels per orien- 
tation, they yield histograms of cumulated filament length per orientation angle and 
these histograms are then further analyzed [4, 15]. 

Local orientation and centerline images are produced by the FiberScore program 
[18] which provides global information on accumulated line length and average 
width. Line objects are not produced, however. For our cell images, we tried out the 
methods applied in FiberScore, but did not yield optimal results [8]. A fundamental 
drawback for applying FiberScore, however, is that neither the program’s nor its 
framework’s source codes are freely available. Even though the original developer 
has been very helpful and supportive to make the program run, FiberScore could not 
be tailored to our needs. 

With the FS we have developed an image processing tool that returns stress fiber 
structures from live cell images, as well as from fixed cells images applicable to 
the use case where images vary widely in brightness, contrast, sharpness and homo- 
geneity of fluorescence, cf. Fig. 10.1. Typically, in our setup of live cell imaging, we 
observe 30 cells over a period of 24 h, taking an image every 10 min. As we aim at 
real time processing, this leaves about 20 s process time per image. 

The FS thus developed can be used to binarize filament structures for any (sets 
of) images containing fiber features. Applications in a wide range of use cases come 
to mind, in particular in the context of actin fiber structures, e.g. [20, 21], but also 
for more general contexts in medical imaging, biology, and in the material science. 
As the FS is modular and easily extensible, several authors, including [22, 23], have 
built on it after it was first published. 

Notably, there is also a rising demand to address the task of tracing and tracking 
stress fibers, both over space and over time. We mention studies on migrating cells 
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which display a variety of stress fiber types (dorsal, ventral, arc) that appear at 
different loci inside a migrating cell [21, 24-28]. Their exact cellular function is 
still in the dark, it could be clarified, however, using live cell filament digitization. 
Indeed, when stress fiber dynamics are followed over time, this may give further 
insight into formation and function of filament structure. A novel method to analyze 
traction force microscopy data, so called model-based traction force microscopy has 
been recently described by [29]. In this context, it is necessary to detect and mark 
the stress fibers in a cell in order to link forces to fiber location and develop deeper 
insight into cellular force generation and transmission to the substrate. As mentioned 
before, ideally, corresponding live cell experiments are performed simultaneously 
for many cells in order to arrive at statistics that are sufficiently significant. This 
requires that algorithms for fiber analysis perform tracing and tracking in (nearly) 
real-time, ideally. 


10.3.2 The FilamentSensor and the Benchmark Dataset 


To obtain the full information of the stress fibers in cells, namely location, length, 
width, and orientation, from repeated observations of living cells under widely vary- 
ing conditions in near real time the FS has to extract 


(D fast and unsupervised 
(ID robustly 
(II) all filament features: location, length, width and orientation; 


where (II) implies dealing with several specific problems illustrated in Fig. 10.2 


(Ha) detecting darker lines crossing bright lines, 
(Ib) dealing with image inhomogeneities and 
(IIc) dealing with image blur and noise. 


The FS is specifically designed to meet these challenges. Dealing with image 
inhomogeneity calls for the application of local image processing tools. Blurring 
effects will be mitigated by line enhancement through direction sensitive methods. 
Crossings of lines of varying intensities can be successfully detected by what we 
call line Gaussians which utilize oriented thin masks. After local binarization, an 
adaption of the semilocal line sensor approach to fingerprint analysis [30] is applied 
to extract all filament features. As the FS is modularized, employs local and orienta- 
tion dependent image analysis methods and outputs the entire filament data, expert 
knowledge such as detecting fewer filaments in specific low variance areas, say, can 
be easily incorporated. 
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Fig. 10.2 Challenges for filament extraction. a blur (detail from Fig. 10.1d). The overall contrast 
of the cell body is very low and lines are hardly discernible. b overexposure and noise (Fig. 10.1c). 
The extensive regions of maximal brightness hide any structure that may be present in those regions. 
Salt and pepper noise is clearly visible as dark spots in bright areas and bright spots in dark areas. 
c filament crossings (Fig. 10.1b). A bundle of roughly vertical filaments of varying brightness crosses 
a bundle of roughly horizontal filaments with varying brightness 


10.3.3 Detecting Slightly Bent Filaments 


After preprocessing and binarization, as described in [8], filament data is extracted 
from the white pixels of the binarized image. Visual inspection of fluorescence micro- 
scopic images reveals that actin stress fibers can be slightly bent. To take this into 
account, we have adjusted the FS to follow slightly curved lines on a piecewise linear 
path. Line detection is performed by the following algorithm. 


1. Every white pixel (x, y) is assigned a width, W(x, y). This is done by taking 
circular pixel neighborhoods of the pixel (cf. Fig. 10.3) with increasing diameter. 
A diameter is accepted, if the ratio of white pixels of the binary image is above 
an adjustable tolerance (default 95%). If a diameter was accepted, the next larger 
diameter is proposed until a diameter is not accepted. The width W(x, y) at the 
pixel is then given by the largest accepted diameter at the pixel. In particular, this 
gives a range of widths 1 < wı < ... < wg = max W(x, y) attained by pixels. 


Fig. 10.3 Some circle masks. These are examples of the circular masks used by the segment sensor 
algorithm to determine line width. The circles displayed here correspond to diameters of 2, 4, 6 and 
8 pixels. The masks are squares with an odd number of pixels as they are centered at a unique pixel 
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A temporary List L, the filament data set F, and the orientation field O are each 
initialized by the empty set. 

2. For every white pixel, starting with the highest width value and continuing with 
decreasing width value, apply the CurveTracer (CT). The curves are represented 
piecewise linearly. The user specifies four parameters, namely the length of lin- 
ear pieces lin, the direction step size in degrees &step, the maximal number of 
angle steps between two adjacent linear pieces ngtep, leading to maximal angle of 
Östep * Mstep between adjacent line pieces, and a minimal line length /nin- 


a. For each white pixel (x, y) the CT probes into a number of directions 
(by default step = 3°; this corresponds to 60 orientations 3°, 6°, ..., 360°). 
For each direction the CT follows a straight line from (x, y) for a num- 
ber of 2hin pixels. For each of these directions, the average width value 
is calculated and two almost opposite directions (with a relative angle in 
[180° — NngtepPstep, 180° + NstepPstep]) with the largest combined average width 
are selected. For each of these two directions the CT now proceeds separately 
as follows, using a point list P containing the starting point. 

i. Move hin pixels in the current direction ¢, and add the end point p to P. If 
the average width along this line is below 1, remove all pixels with width 
0 from the end of the line and then proceed removing pixels (possibly with 
width greater than 0) from the end until the average width is at least 1. 
Then add the new final point p of the line to P and stop. 

ii. From p probe for 2lin pixels into the 2ngtep + 1 directions {ġe — NstepPstep, 


Pe — (Nstep — step, - - -> Pe + Nstep®step} and calculate average width val- 
ues for every direction. Set the new ġe to the direction of highest average 
width. 


iii. Return to step 1. 

b. When the CT searches in both directions have reached their end points, the 
combined length of the line pieces is determined and if it is larger than Lmin the 
list of points from both pieces is stored to L. The CT is illustrated in Fig. 10.4. 

c. In the next step, segments in L are called in the order of their length, long 
segments first. For every segment, the orientation field O (which is empty 
when first called) is looked up for every pixel on the segment. If less than 
30 % of the segment’s pixels have a conflicting orientation entry in O,—i.e. 
the entry in © differs by less than an adjustable tolerance angle (per default 
20°) from the segment’s orientation—the segment is accepted as valid. For 
every pixel within a circular neighborhood with diameter w; + 2 pixels (in 
order to avoid duplicate lines in case the CT does not perfectly follow the path 
of maximal width) of a segment pixel, the segment’s orientation is stored to 
O overwriting possible previous entries. The segment is then also added to F. 
If at least 30 % of the pixels on a segment have a conflicting orientation, we 
have the following cases. 

i. If O does not carry a conflicting orientation for any of the endpoints, the 
segment is discarded. 


272 B. Eltzner et al. 


Fig. 10.4 Illustration of the CT algorithm. The probing distance is twice as far as the step length 
of the CT. Per default, the CT probes three directions at each step where the number of directions as 
well as the angle between them can be adjusted by the user. The probing directions are visualized 
in three shades of blue. The chosen direction is then marked green and the others are marked red 


ii. Otherwise, the endpoints with conflicting orientations are iteratively 
removed from the segment until the remaining segment’s endpoints no 
longer have a conflicting orientation. If the resultant segment length is 
above the threshold of minimal filament length, this new segment is added 
back to L and the original one is removed. The new segment is revisited 
when its length is called. 


As lines are blurred due to scattering and as the preprocessing usually enhances 
line width, the FS tends to find greater line width than a human expert (cf. benchmark 
data set in [8]). 


10.4 Orientation Fields 


We make use of the individual line data provided by the FS to identify local orienta- 
tion fields of fibers. These provide a more detailed picture of the cytoskeleton than 
simple summary statistics or orientation histograms, since they take local features 
into account. Orientation fields are contiguous regions in the cell filled with filaments 
of similar orientation. The local orientation of the field may change slowly over the 
cell, so as to encompass the case of a curved cell, where stress fibers follow the cell 
shape. As input we use an orientation map denoting the line orientations determined 
by the filament sensor at each pixel. Our algorithm uses relaxation labeling, first 
described by [31]. 

As a first step, the image is covered with a rectangular equidistant grid of pixels 
with spacing a. The grid points define blocks in the following. The minimal spacing 
of the grid is 5 pixels. While the number of blocks b covering the cell area in the 
image exceeds 500, the spacing is successively increased by steps of two pixels. As 
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a result, the maximal number of non-trivial blocks is 500. In this way, the algorithm 
can deal with images of different magnification and cells of different size. 

For every block, we use a square isotropic Gaussian mask with o = max(15, 
[1.2a]) centered at the grid point to assign weights to nearby pixels. The length 
of the mask from the center is l = [2.50] pixels in every direction, after which 
it is truncated. Using the orientation map and these weights, we get an orientation 
histogram for each block. This orientation histogram is then smoothed with a wrapped 
Gaussian kernel with o = 6°. Of the smoothed histogram, all maxima are stored as 
local orientations of the block. If several neighboring bins of the smoothed histogram 
have the same, locally maximal, value, the leftmost bin, corresponding to the smallest 
angle, is used. 

To perform relaxation labeling, it is necessary to have a number of blocks with 
fixed orientations that can be used as a seed. In order to achieve reliable results, it is 
desirable to have as many seed blocks as possible. Therefore, we try to determine at 
least s = 0.05b seed blocks. For this we collect all blocks with only one orientation 
and apply the following cleanup procedure: 


1. Keep only the largest contiguous region. 

2. Make a histogram of the block orientation smoothed with a Gaussian kernel with 
o = 6°. And determine the global maximum max- 

3. Starting with k = 0, we determine the largest contiguous region of blocks whose 
orientation & satisfies @ — dmax < k. We then increase k by steps of 1, until we 
reach k = 6 or until the largest contiguous region of blocks reaches or exceeds 
the number of s = 0.05b blocks. 


If the number of seed blocks is smaller than s, we repeat the cleanup procedure 
for all blocks using all orientations of every block in step 2. 

Once we have a set of seed blocks with seed orientations, we fix these orientations 
and perform a relaxation labeling over all orientations of all non-seed blocks. For 
the relaxation labeling, we use a von Mises type compatibility function 


fl) = C + Bexp(A cos(2¢)) 
with f()=1, f90)=-1, f(o)=0 


where we start with o = 15°. If the largest field contains less than 2/3 of line pixels 
or less than 85% of the blocks, we repeat the relaxation labeling, increasing o by 
steps of 5° to a maximum of o = 25°. 

To preclude too large changes of orientation at medium range, we do not use only 
nearest neighbors for the relaxation, but every block reacts with isotropic Gaussian 
weights with surrounding blocks, where the standard deviation of the Gaussian is 
2% = 5 blocks. Every block also has a dummy orientation, whose probability will 
slowly grow, if none of the block orientations match their neighborhood. Blocks 
neighboring on seed blocks and having reached a probability of 0.999 on one of their 
orientations will be turned into seed blocks, so the field will gradually “freeze” to 
ensure convergence. 
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Fig. 10.5 Orientation field detection. Lines segments extracted by the FS (upper left panel) col- 
ored by local orientation, giving five dominating orientation fields (their range and local orientations 


displayed in the other panels). Indeed, colors of the fields, indicating their local orientation, vary 
only slightly 


When the relaxation has converged, the resulting field is saved and its correspond- 
ing orientations are removed from the blocks. If the remaining nontrivial blocks do 
not form a contiguous region, all regions below minimal size s are removed. The 
procedure is then repeated until no non-trivial blocks remain. Finally, every filament 
is sorted into the orientation field whose local orientation best matches its own ori- 
entation (Fig. 10.5). If the filament orientation diverges by more than 15° from all 
local orientation fields, it is not associated to any field. 


10.4.1 Orientation Field Evolution 


In most cells on substrates with stiffness 10 and 30 kPa a single orientation field 
emerges over 24 h, which contains more than 80% of stress fiber length. In order to 
illustrate the evolution of orientation fields in time, we represent the orientation field 
at each time point by a gray circle, whose gray level displays the relative amount 
of fiber length represented by that field, such that the circle is black when all fibers 
are included in the field. The standard deviation of fiber orientations in the field are 
displayed by error bars for each circle. The evolution of orientation fields for atypical 
cell on an intermediate or stiff gel is displayed in Fig. 10.6. 

In Fig. 10.7 we show a typical orientation field evolution for a cell on a soft 
substrate with stiffness 1 kPa. In cells on such soft substrates the cytoskeleton is 
much less ordered which is reflected by a large number of small orientation fields 
which are found over time. In a cell where fibers are not ordered, orientation fields 
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Fig. 10.6 Orientation field time series for a typical cell on a substrate with stiffness >10 kPa, 
at each time point represented by a gray circle, whose gray level displays the relative amount of 
fiber length represented by that field (the circle is black when all fibers are included in the field). 
The standard deviation of fiber orientations in the field are displayed by error bars for each circle. 
As is typical for a cell on a stiff substrate, a single main orientation field emerges after a few hours 
and remains stable throughout imaging time 


often appear only for few images. While such fields can be considered spurious, they 
still serve to illustrate the disorder of the actin cytoskeleton. 

In many cells on all gels the cytoskeleton is not fully described by just one orien- 
tation field but is partially ordered. A frequently observed evolution starts out with 
an almost chaotic cytoskeleton where the short lived small orientation field converge 
into a main orientation field over time, as in Figs. 10.6 and 10.7. This process can take 
between 4 and 20 h and can even be unfinished at the end of the 24 h observation span. 
However, there are also cases, where a main orientation field, which has remained 
stable for many hours suddenly disperses as the cell starts to move (corresponding 
to Figs. 10.6 and 10.7 with inverted time axis). 

A behavior, which is observed in less than 5 % of cells, is illustrated in Fig. 10.8. 
In this case, a stable main orientation field exists, when at some point in time a new 
orientation field begins to form and the original main field starts to dissolve. This 
behavior requires more thorough investigation, both into the underlying cell dynamics 
and into adequate statistical representation of cytoskeleton order. Elucidating this 
behavior is left for future research. 
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Fig. 10.7 Orientation field time series for a typical cell on a substrate with stiffness 1 kPa 
(notation as in Fig. 10.6) with several local orientation fields indicating a less ordered cytoskeleton. 


While over time a dominating orientation field emerges, smaller orientation fields pop out until the 
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Fig. 10.8 Orientation field time series for a cell on a substrate with stiffness 10 kPa (notation 
as in Fig. 10.6). The initial main orientation field starts to decay after 10 h and is superseded by a 
new orientation field which is almost perpendicular to it 
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10.4.2 Backward Nested Descriptor Analysis 


In order to use orientation fields for quantitative analysis, we devised the simple low 
dimensional orientation fields representation (10.1) in [10]. Denote by M the number 
of all filament pixels in a cell image, mı denotes the number of filament pixels in the 
largest orientation field and m, the number of pixels in all smaller fields combined. 
M — m; — m, then enumerates the pixels from “rogue” filaments which do not fit 
into any of the orientation fields due to strongly deviating orientation. In order to 
compare relative diameters rather than relative areas we observe the quantity 


x = (x1, X2, x3) = (Ymı/M, Ym2/M, I = (m; +m)/M)" € $ (10.1) 


on a two-sphere. In this representation, the spherical data lie in the first octant. We 
observe that points tend to accumulate close to the x2 = 0 plane, representing cells 
with one single orientation field, cf. Fig. 10.9. 

In order to interpret live cell observations, we compare these observations to 
time series of cells which were fixed after different times on a gel. We have taken 
images of fixed cells for each gel rigidity in intervals of 4 h of time on the gel. The 
sample sizes are displayed in Table 10.1. These cells were stained with phalloidin 
as opposed to the live cells, which were transfected with liveAct. To compare the 
live cell experiment to the fixed cell experiment, we only consider images from live 
cell movies corresponding to the fixation times. Since we have between 50 and 60 
movies on each gel, we can expect a higher data variance for the live cells compared 
to fixed cells. Some of the investigated samples are displayed in Fig. 10.9. 

We analyze the samples on S? by applying dimension reduction via principal 
nested great spheres from [32]. This means we first identify the great circle which fits 
the data best (in terms of accumulated squared spherical distance), then orthogonally 
(along great circles) project data to this great circle and determine their Fréchet 
mean on this great circle, called the nested mean. Jointly, the two give our backward 
nested data descriptor. To estimate its variance, i.e. the variance of the great circle and 
the nested mean, we use B = 1000 bootstrap replicates from the data. Figure 10.10 
displays the backward nested descriptor and bootstrapped means illustrating the 
spread of the nested mean estimator. 

The lower sample size leads to considerably higher variance for live cells, as 
expected. However, fixed and live cells on the same substrates are strikingly different 
from each other. 

While live cells on the soft (E = 1 kPa) gels show little temporal evolution, the 
fixed cells exhibit a development towards a dominant main field with few smaller 
fields and eventually less rogue filaments. For the cells on stiffer gels, while the 
fixed cells strengthen their main field, mainly at the expense of rogue filaments, for 
live cells this effect is stronger, leading to fewer rogue filaments over time as well. 
Remarkably, for the fixed cells we observe an increased number of rogue filaments 
after 16 and 20 h, which does not occur for live cells. A T? hypothesis test developed 
in [10] for backward nested descriptors verifies that this effect is significant. 
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Fig. 10.9 Visualization of the orientation fields representatives on S? (blue points) from (10.1) 
with backward nested data descriptor given by the best approximating great circle (blue line) with 
nested mean (red star). All images correspond to 16 h on the gel. Upper left: 135 fixed cells on a 
1 kPa gel; upper right: 127 fixed cells on a 10 kPa gel; lower left: 59 live cells on a 1 kPa gel; lover 
right: 53 live cells on a 30 kPa gel 


Table 10.1 Sample sizes of hMSC skeleton images over varying Young’s moduli and cultivation 
time, left for fixed cells, right for live cells 


Time} 1 kPa} 10 kPa} 30 kPa 
4h 159 168 153 
8h 163 164 153 
12h 176 171 173 
16h 135 127 147 
20h 138 126 127 
24h 166 152 152 


1 kPa 
59 


10 kPa 
54 


30 kPa 
53 
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Fig. 10.10 Best fitting great circles with nested means. For every time and sample, nested means 
of B = 1000 bootstrap samples are displayed to illustrate variance of the mean. Rows from top to 
bottom: 1 kPa gel, 10 kPa gel, 30 kPa gel 
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Upon closer inspection, as noted in [10], describing cells by their orientation field 
decomposition, the temporal evolution of fixed cells comes to a stop and at roughly 
12 h, often reversed then, hinting to an increased rate of cell division, albeit cells 
near the division process have been singled out previously, cf. Sect. 10.2. Although 
these cells are not intentionally synchronized, an increased rate of cell division after a 
particular time after seeding is not surprising as trypsinization and re-seeding slightly 
decreases the isotropic temporal distribution on the cell cycle. For live cells, such an 
effect is not observed, since all movies during which the cells divide are left out. The 
statistical analysis confirms our initial hypothesis that a direct comparison of results 
from live and fixed cells is complicated due to the different population subsets and 
points out the importance of careful experiment design including proper controls for 
future studies. 


10.5 Outlook 


In this chapter we have briefly illustrated the statistical biophysical toolbox devel- 
oped over 8 years in project B8 of the SFB 755, the support of which we gratefully 
acknowledge, and we have applied a typical set of descriptors (mean great circle and 
nested mean on it), exemplary to highlight differences in the actin-myosin cytoskele- 
ton structure of live and fixed cells. Future application include tracing and tracking 
stress fibers over space and time and usage in many other demanding research areas. 
For example, studies on migrating cells indicate various stress fiber types classi- 
cally described as (dorsal, ventral, and arcs) appearing at different locations inside 
a migrating cell [21, 24-28]. Following the filament dynamics over time will give 
further insight into the formation and function of stress fibers. Using our toolbox 
applied to live cell imaging, it seems promising that we can come to an unbiased 
statistical classification of the cytoskeleton that relates temporal and spatial persis- 
tence to function. Recently, [29] described a novel method to analyze traction force 
microscopy data, so called model-based traction force microscopy. Here, it is imper- 
ative to detect and mark the stress fibers of a cell in order to link forces to fiber 
location and gain more insight into cellular force generation and transmission to the 
substrate. 
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Chapter 11 A) 
Photonic Imaging with Statistical get 
Guarantees: From Multiscale Testing 

to Multiscale Estimation 


Axel Munk, Katharina Proksch, Housen Li and Frank Werner 


Abstract In this chapter we discuss how to obtain statistical guarantees in photonic 
imaging. We start with an introduction to hypothesis testing in the context of imag- 
ing, more precisely we describe how to test if there is signal in a specific region of 
interest (RoI) or just noise. Afterwards we extend this approach to a family of Rols 
and examine the occurring problems such as inflation of type I error and dependency 
issues. We discuss how to control the family-wise error rate by different modifi- 
cations, and provide a connection to extreme value theory. Afterwards we present 
possible extension to inverse problems. Moving from testing to estimation, we finally 
introduce a method which constructs an estimator of the desired quantity of interest 
with automatic smoothness guarantees. 
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11.1 Introduction 


The analysis of a photonic image typically involves a reconstruction of the measured 
object of interest which becomes the subject of further evaluation. This approach is 
frequently employed in photonic image analysis, though it can be quite problematic 
for several reasons. 


1. As the image is noisy and often inherently random, a full reconstruction relies on 
the choice of a regularisation functional and corresponding a priori assumptions 
on the image, often implicitly hidden in a reconstruction algorithm. Related to 
this, the reconstruction relies on the choice of one or several tuning parameters. A 
proper choice is a sensible task, in particular when the noise-level is high and/or 
inhomogeneous. 

2. The sizes of the objects might be below the resolution of the optical device which 
further hinders a full reconstruction. 

3. As the resolution increases, the object to be recovered becomes random in itself 
as its fine structure then depends on, e.g., the conformational states of a protein 
and the interpretation of the recovered object might be an issue. 


It is the aim of this chapter to provide a careful discussion of such issues and to 
address the analysis of photonic images with statistical guarantees. This will be done 
in two steps. In Sect. 11.2 we survey some recent methodology, which circumvents 
a full recovery of the image, to extract certain relevant information in such difficult 
situations mentioned above. Based on this (see Sect. 11.3), we will extend such 
methods also to situations in which a full reconstruction is reasonable, but still a 
difficult task, e.g., when the multiscale nature of the object has to be recovered. In 
both scenarios we will put a particular emphasis on statistical guarantees for the 
provided methods. 

An example where a full recovery of the object of interest is typically not a valid 
task is depicted in the centre of Fig. 11.1 where a detail of a much larger image is 
shown (see Fig. 1 in [1] for the full image). The investigated specimen consists of 
DNA origami which have been designed in such a way that each of the signal clusters 
contains up to 24 fluorescent markers, arrayed in two strands of up to 12, having a 
distance of 71 nanometers (nm) (see left panel of Fig. 11.1 for a sketch of such a 
DNA origami). As the ground truth is basically known, this serves as a real world 
phantom. 

Data were recorded with a STED (STimulated Emission Depletion) microscope 
at the lab of Stefan Hell of the Department of NanoBiophotonics of the Max Planck 
Institute for Biophysical Chemistry. In contrast to classical fluorescence microscopy, 
the resolution in STED microscopy is in theory not limited and can be enhanced by 
increasing the intensity of the depletion laser [2]. However, this increase comes at 
the price of a decrease in intensity of the focal spot, which bounds the resolution in 
practice. Therefore a convolution of the underlying signal with the PSF of the STED 
microscope is unavoidable and a full reconstruction of the DNA origami (or the 
shape of the markers) appears to be difficult. However, for most purposes this is also 
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Fig. 11.1 (Detail of Fig. 1 in [1]) Left: Sketch of single DNA origami, middle: detail of image of 
randomly distributed DNA origami, right: detected strands of markers 


not relevant. Instead, less ambitious tasks will provide still important information, 
e.g., the location of these fluorescent markers. This can be done via a statistical test, 
which is presumably a much simpler task than reconstruction (estimation in statistical 
terminology) and it can be tailored towards answering particular questions “How 
many strands of markers are there?” and “Where are the DNA origamis located?”. 
The right panel of Fig. 11.1 shows the locations of markers as found by such a 
statistical test (from the data in the middle panel in Fig. 11.1) which will be introduced 
later on. 


11.2 Statistical Hypothesis Testing 


11.2.1 Introduction 


We will see that proper testing in the above example (Fig. 11.1) is already a complex 
task. Therefore, in this section, we first introduce the concept of statistical testing in 
a basic setting. The first step in statistical hypothesis testing is to define the so-called 
null hypothesis, H, and the alternative hypothesis, K: 


H : “Hypothesis to be disproven” 
K : “Hypothesis to be sustantiated”’. 


For example, H might correspond to the hypothesis that no marker is contained 
in a certain given region of the image, K corresponds to the contrary that there is 
at least one marker in this region. A statistical significance test is a decision rule 
which, based on given data, allows to discriminate between H and K. If a certain 
criterion is met, H is rejected and K is assumed. If not, H cannot be rejected. For 
instance, the photon count in a certain given region of a noisy image gives rise to 
the believe that at least one marker is contained therein. This could be tested, for 
example by checking whether the total number of photons detected in this region is 
larger than a certain threshold. However, due to the involved randomness of photon 
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emissions and background noise such a finding is associated with a (certain) risk of 
being incorrect. A statistical test aims to control this risk. Hence, prior to performing 
a Statistical test, a tolerable risk « is specified, typically in the range of 0.01 up to 
0.1, corresponding to accepting the error rate that, on average, in at most a - 100% of 
the cases the null-hypothesis H is falsely rejected. Such an a is called significance 
level. This is written as 


P (“H is rejected although H is true”) < a. (11.1) 


Here, P stands generically for all possible distributions under H and P(A) denotes 
the probability! of an event A. If the test criterion is chosen such that (11.1) holds, 
the corresponding test is called a level-a-test. The ability of a test to correctly reject 
H is called detection power. If H corresponds to the hypothesis that no marker is 
located in a certain given region, the test (i.e., the data based decision procedure) is 
then constructed in such a way that the probability a to falsely detect a marker in an 
empty region is controlled. H and K are chosen in such a way that the false rejection 
of H is to be considered the more serious error and controlled in advance. In our 
scenario, this means that we consider wrong detection of a fluorophore as the more 
serious error than missing a fluorophore. 


11.2.2 A Simple Example 


To demonstrate this concept more rigorously, we now consider a very simple Gaus- 
sian model, which can be seen as a proxy for more complicated models. Assume that 
one observes data 


Y; = ui +e i=l..,n, (11.2) 


where u; > 0 denote possible “signals” hidden in observations Y;, ande; ~ N (0, 1) 
are independent normal random variables with variances g? = 1 (for simplicity). 
Assume for the moment that all signals have the same strength, u; = u > 0. The 
interest lies in establishing that u > 0, i.e., presence of such signal in the data. Hence, 
we set 


H : p = Q (to be disproven) vs. K : u > 0 (to be substantiated). (11.3) 
The goal is now to find a suitable criterion which, given Yı,..., Yn, allows to 


decide in favour or against H in such a way that the error to wrongly reject H is 
controlled by a. From a statistical perspective the aim is to infer about the mean of 


!More formally, (11.1) is meant as P (“H is rejected although it holds”) < œ under all possible 
configurations under H. Only where necessary this will be made explicit in the following by an 
additional subscript. 
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the Y; which should be close to the empirical mean Y = 1 >", Y; of the data. An 
intuitive decision rule would be to check whether Y is “clearly” larger than zero, 
Y> Ya, say, for a suitable threshold ya > 0. We consider the normalized (i.e. with 
unit variance) sum 


1 n 
T(Y) := — Y; 
(Y) 2 


and choose, for prescribed a > 0, the threshold ya such that we have equality in 
(11.1). As under the assumption H we have that u = 0, this gives” 


1 n 
P (H is falsely rejected) = P (= > Ei = x) =P(N 0,1) = Ya) 
n 
i=l 
=1- 9 (a) =a, (11.4) 


since F Ea ei ~ N (0, 1). Here © denotes the cumulative distribution function of 


a standard normal random variable: ® (x) = For fi = eTd y. If H holds true, i.e., 
H = 0, (11.4) holds if we choose ya = zı_., where zı_. is the (1 — a)-quantile of 
the standard normal distribution, e.g., zı-“ = 1.6449, when a = 0.05. The statistical 
test that rejects H whenever T(Y) > zı-. is called Z-test and is a level-a test. 
Furthermore, if a signal is present, i.e., y; = u > 0 we have that 


1 n 
P „=„(H is correctly rejected) = P,,,=,, (i Yu +e) > ra) 
n 
i=l 
= 1-—P(N(, 1) > p/n - ziza) = 1 — (uv — 21-0). 


Since 1 — ®(x) < exp(—4x°) for x > 1//27 (see, e.g., [3], inequality (1.8)), 
we obtain 


1 aN 
P „=„(H is correctly rejected) > 1 — exp ( z” (1 = ) ) 
>1 d 2 
— exp | —— ; 
= p ha 


for sufficiently large n. This means that, if the number n of data points grows, the 
detection power (the case when u > 0) of the Z-test converges to 1 exponentially 
fast. This test has been derived in an intuitive way but it can be proven that it is 
a uniformly most powerful (UMP) test (see [4], Chap. 3.4). This means that for all 
u > 0 (we. the alternative K holds) the detection power is maximized among all 


?Here P corresponds to only one configuration of distributions when all u; = u = 0. 
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Fig. 11.2 Three different signals (upper row) and noisy signals (lower row) 


level-a-tests, i.e., all possible decision rules one might think of which satisfy (11.1) 
in our set up based on the data Yı,..., Yn. 


Z-test 

Comparison of the normalized empirical mean of the set of measurements to 
a given threshold to assess difference in location to a given constant jug. When 
Lo = 0 the Z-test rejects H : u = vo = 0 in favor of K : u > Oif 


1 n 
ee 
vn i=l 


This is the best possible test at level a if the data Y;,..., Y, are independent 
and N (u, 1) distributed. 


11.2.3 Testing on an Image 


Subsequently, we consider three illustrative synthetic images of size 60 x 60, shown 
in Fig. 11.2 (see the upper panel for a noise-less version and the lower panel for 
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a noisy image). These serve the purpose of explaining how to extend the above 
simple Z-test to detect a signal in an image, which is a more complex task. To 
illustrate, we assume for the moment that in these images the intensity on each pixel 
Y;,i=1,...,n, follows a N (j;, 1) distribution, where each u; takes one of the 
four values 0, 2, 3.5 and 5 (see Fig. 11.2). Now, our goal is to segment the image into 
regions with signal and empty regions while maintaining statistical error guarantees. 
Note that we do not aim to recover the exact value of each y;, only whether it 
is positive or not (no signal). To this end we will perform many “local” statistical 
Z-tests on different (and possibly overlapping) regions of this image. We will discuss 
several approaches (Scenarios 1-5) which provide a step-by-step derivation of our 
final solution (Scenario 5). As it turns out, the crucial issue will be to control the 
statistical error of wrong decisions of all these tests simultaneously (overall error). 


Scenario 1 (Known position, one test for central 20 x 20 square) Assume for now 
that we are only interested whether there is some signal in the central 20 x 20 square 
(framed in blue in the upper row of Fig. 11.3), i.e. we fix the location to be investigated. 
For this task, we now perform a Z-test at level œa = 0.05 for the central square with 
n = 20 x 20 = 400 pixels, i.e., the test statistic 


1 
Teentral 20x 20 square (N ae >. Y; (1 1 .5) 


central 20x 20 square 


is compared to zı_a = 1.6449. The test allows for exactly two outcomes: rejection 
(of the hypothesis H : no signal in the 20 x 20 square) or no rejection. In the sec- 
ond row of Fig. 11.3 the results are depicted. In each of the three test images, the 
Z-test correctly recognizes that there is signal in the central square, and to visualize 
this, the square is marked in green. The test decision is correct, however, we cannot 
draw more (localized) information from this test. Nevertheless, this gives us a first 
guide how to obtain a segmentation into regions, our final task. Note, that the Z-test, 
as we derived it in Sect. 11.2.2, is still applicable although we did not assume the 
alternative that all signals have the same strengths (recall Sect. 11.2.2). This will 
only affect the power. Crucial is that the test controls the error at level a correctly 
under the assumption that all signals p; = 0. 


Given a region of interest (Rol), performing one test on the whole region, as done 
in the previous scenario, only allows to infer on the entire Rol, i.e., the largest scale 
there is, finer details cannot be discerned. In the following step we consider the finest 
possible scales, i.e., tests on single pixels, hoping that we can extract more detailed 
information on different parts of the image, simultaneously. 


Scenario 2 (Known position, pixel-wise tests in 20 x 20 square) Assume again that 
we are only interested in testing within the central 20 x 20 square. We now perform 
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Fig. 11.3 Noisy signals (upper row) and test results from Scenario | (lower row). The square is 
marked in green to show that the test was significant for all three images 


a test for each entry in the 20 x 20 central region separately, in total 400 tests. The 
test statistics T; th pixe (Y ) are given by the pixel values. For simplicity, we consider 
tests for the presence of a signal at pixel “i” which are only based on the observation 
Y; at pixel i, i.e. 


Tj-th pixe (Y) = Yj, (11.6) 


and are compared to z1-a = 1.6449. Again, each test allows for two outcomes: 
rejection or no rejection. In the second row of Fig. 11.4, exemplary results are depicted 
(all pixels, for which positive test decisions have been made are marked green). 


It is obvious that Scenario 2 gives more detailed information on the signal, but at 
the expense of several false detections. This is an important issue and will be discussed 
in more detail in the following section. It is also obvious that parts of the weak signal 
are missed (see Fig. 11.4: Only 71.25% of the active pixels are detected in the left 
test image and 85% in the second one). This is due to the fact that the local tests do 
not take into account neighboring information (surrounding data) from which they 
could borrow detection strength. This will also be refined in the subsequent sections. 
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Fig. 11.4 Noisy signals (upper row) and the corresponding test results from Scenario 2 (lower 
row). The significant pixels are marked green, insignificant pixels are blue 


False test decisions 
There are two kinds of possible false test decisions: 
1. Type I error (probability of its occurrence is controlled by a). 
Here: Selection of a Rol although it does not contain any signal (see lower 
right panel of Fig. 11.4). 
2. Type H error (a missed rejection, not controlled). 
Here: Missing to select a Rol that contains signal (see lower left panel of 
Fig. 11.4). 


11.2.4 Testing Multiple Hypotheses 


In Scenario 2 in the previous section we applied 400 single Z-tests in the central 
square of the synthetic image. It is obvious from Fig. 11.4 that this approach suffers 
from many false detections, in particular when the signal gets sparser (see lower 
right plot in Fig. 11.4). This issue becomes even more severe if the number of tests 
increases, as the following test scenario illustrates. 


Scenario 3 (Unknown position, pixel-wise tests, whole image) If we do not have 
prior information on the particular region which we should investigate, we need 
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Fig. 11.5 Noisy signal (upper row) and the corresponding test results from Scenario 3 (lower row). 
The significant pixels are marked green 


to scan the entire image. In generalization of Scenario 2 (the Rol is now the full 
image) to the case of unknown signal position, all single pixels of the entire image 
are tested. This results in 3600 tests. The results are shown in the second row of 
Fig. 11.5. Obviously, the number of false rejections increases with the number of 
tests. In fact, this did not just randomly happen, it is a systematic flaw which we 
encounter when we naively perform many tests on the same image, simultaneously. 


11.2.4.1 Number of False Rejections 


The statistical control of false rejections is a general problem one encounters in 
multiple testing (i.e., testing many hypotheses simultaneously on the same data). The 
increase of false rejections with increasing number of tests is denoted as multiplicity 
effect. 

Figure 11.6 shows the probabilities that out of n independent Z-tests, at least 
1 (solid line), 10 (dashed line), 75 (dotted line) and 150 (dash-dotted line) false 
rejections occur. The curves suggest that in the situation of Scenario 3 we need to 
expect at least 150 false detections. In fact, the probability that many wrong rejections 
are made within N tests, each at level a, performed on a data set converges to 1 
exponentially fast. 
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Fig. 11.6 Exact probabilities (y-axis) that out of n (x-axis) independent Z-tests with a = 0.05, 
at least 1 (solid line), 10 (dashed line), 75 (dotted line) and 150 (dash-dotted line) false rejections 
occur. Here, n = 1, 2, ..., 4000, where the respective probabilities are zero as long as n is smaller 


than 10, 75 or 150, respectively 

Lemma 11.1 /f0 < a < 1/2, N > 2andk < N log(1 + a)/log(N), we have that 
P (at least k out of N false rejections) > 1 — (1 — aa”. 

Proof The random variables /{i — th test rejects}, where J denotes the indicator 


function, follow a Bernoulli distribution with parameter a. Therefore, if a < 1/2, 
we can estimate the probability that out of N > 2 tests k false rejections are 


made, as 


k-1 
P (at least k out of N false rejections) = 1 — 5 P (exactly k false rejections) 
j=0 
k-1 k-1 
N a N 
-1-I( N= aya! > 1-day" ( ') 
jæ S jæ S 


It follows, e.g., by induction over k for any N > 2, that Ee, W) < N*, which 
implies 


P (at least k out of N false rejections) > 1 — (1 — aY NE. 
For k < N log(1 + a)/log(N) we thus find 
P (at least k out of N false rejections) > 1 — (1 — a”. 


Hence, the probability of making at least k out of N false rejections converges to 
1 exponentially fast, as N —> oo. 
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To reduce the number of false detections, so-called multiplicity adjustments have 
to be made. Two general approaches in this regard are the control of the family wise 
error rate (FWER) and of the false discovery rate (FDR). Here, we will mainly focus 
on the FWER but will briefly discuss FDR control in Sect. 11.2.9. For further reading 
we refer to the monograph by [5] and the references given there. 


Multiplicity effect 

If multiple tests are performed without accounting for multiplicity, the chances 
of making many type I errors are quite large if the false null hypotheses are 
sparse (see Fig. 11.5). 


11.2.4.2 Control of FWER 


One possible way to deal with multiplicity is to control the family wise error rate 
(FWER), that is, controlling the probability of making any wrong decision in all 
tests that are performed. Assume model (11.2) and denote by pe = (u1, ..., Hn) the 
vector of all true means and by P, the probability under configuration y. In the 
previous example of imaging, the sample size n corresponded to the number of 
pixels. Scenarios 2 and 3 were based on many single tests (on many single pixels). 
Such single tests will be referred to as local tests in the sequel. Each of the N (say) 
local tests corresponds to its own (local) hypotheses A; versus K;. For example, in 
the setup of Scenario 3, a local hypothesis is Hj : u; = 0 versus the local alternative 
K; : i > 0, for some i = 1,...,n. In this case n = N when all local hypotheses 
are tested. If only a few are tested, then N « n. If in addition all 2 x 2 Rols are 
tested a total of N ~ 2n tests are performed. 

Assume now that all local tests H; vs. K; are performed, each at error level a/N. 
Then the risk of making any wrong rejection is controlled at level a, that is, the 
FWER is controlled. 


Theorem 11.1 (Bonferroni correction) Given N testing problems H; vs. Ki, i = 
1,..., N and local tests at level a/ N, we have for any configuration p 


P,, (“at least one wrong rejection”) < a. 


Proof 


N 
P, (“at least one wrong rejection”) < > P, (“i — th test falsely rejects”) 


i=1 


= (11.7) 


N N 
< 5 Puy, (“i — th test falsely rejects”) < 2 m 
i=l i=l 
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Since the right hand side is independent of yz we say that the FWER is controlled 
in the strong sense. As a consequence, each finding can be considered a-significant 
and hence can be used as a segment for the final segmentation. Performing tests at 
an adjusted level such as a/WN instead of a is called level adjusted testing and the 
multiple test “reject those H; which are significant at the adjusted level a/N” is 
called Bonferroni procedure. We stress that although Theorem 11.1 was formulated 
for the special case of signal detection in independent Gaussian noise, the Bonferroni 
procedure strongly controls the FWER in much more generality and in particular 
without any assumptions on the dependency structure between different tests [5, see, 
e.g., Chap. 3.1.1, for a more detailed discussion]. 


Scenario 4 (Unknown position, pixel-wise, Bonferroni adjustment) In the situation 
of Scenario 3, we now perform a Bonferroni procedure for the entire image, i.e., for 
all 60 x 60 = 3600 entries (see Fig. 11.7). The local testing problems are 


H; : “No signal in i-th pixel”, i.e., 4; =0 vs. Ki: u; > 0. 


Now n= N = 3600 and a/N ~ 1.3889 x 1075 for a = 0.05. In this scenario 
all single entries are compared to 21-08 © 4.19096. (Recall that in Scenarios 2 
and 3 we compared each entry to the much smaller threshold 1.6449 and note that 
any level adjustment corresponds to an increase of the threshold for testing.) The 
result is shown in the second row of Fig. 11.7. While no false findings were provided 
by any of these tests, too few detections have been made at all as only parts of the 
signal have been detected. 


Bonferroni multiplicity adjustment 

Adjustment (increase) of the thresholds when multiple tests are performed 
simultaneously to control the overall type I error, i.e., the FWER. This is a 
very general but also a conservative method (in particular if the signal is not 
sparse). 


11.2.5 Connection to Extreme Value Theory 


There is a close connection between the control of the FWER in the situation 
of Scenario 3 and extreme value theory. Recall that the aim is to control 
P, (“at least one wrong rejection”) for any configuration u. By monotonicity, we 
have that 


P, (“at least one wrong rejection”) < P,,—9 (“at least one wrong rejection”) , 
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Fig. 11.7 Noisy signals (upper row) and the corresponding test results from Scenario 4 (FWER- 
controlled, lower row). The significant pixels are marked green, insignificant pixels are blue 


which implies that the FWER is controlled if we choose the threshold q for our 
multiple tests such that 


Pu=o (“at least one wrong rejection”) = P (E i € {1,...,N}: 6 >q) <a. 
(11.8) 


Now, since PGi € {1,...,N}: €: > g) = P(max{e),...,ev} >q), q can 
be chosen as the (1 — a)-quantile of max{e),...,¢y} under the global null 
hypothesis, H, 

3600 
H= N H; : “No signal at all”, (11.9) 
i=1 


i.e., u; = O for all i = 1, . . . , 3600. In this case we have equality in (11.8). Note that 


P (max{£1,..., EN} > q) = 1 — P (max{£1,..., EN} < q) 
=] — P (e; < q and £2 < q and ... and €y < q) 
= 1- P(e <4)” =1-@@)”. 


Therefore, 


P (max{£;1,... EN} >q) =Q & (q) =1-a > Pq) = (1—a)*, 
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which yields q = Z(—,)1/n. Since 
a-a) <1-a/N, 


by Bernoulli’s inequality, strict monotonicity of #7! implies the same ordering of 
the thresholds, i.e., Z(;-ay'/" < Z1-a/w- However, it is easy to show that for N > 1 
anda < 1/2 
a+ 
N 


f= < (1 — a)! 


and therefore the difference between zı-ayyn and zı-ayn is quite small, e.g. 
Z(1—0.05)!/360 © 4.18516 and Z1—0.05/3600 % 4.19096. 
The following lemma shows that zı_an © y2log(N) (and therefore also 


Za-ay/n © Y2log(N)). 


Lemma 11.2 There exists No € N such that for all N > No 


V2log(N) — Fam < Zi-ajw < Y2log(N) 


Proof To bound the normal quantiles from above and below, we use 


Oot ote 
X+ 5 x 


[6, see inequality (10)] , where y and ® denote the density and cdf of the standard 
normal distribution, respectively. Since, for sufficiently large N, 


IA 


pl/2 log(N)) 1 a 
1 — &(,/2log(N = 
Wa) /2log(N) N, /4rlog(N) N’ 


and therefore 


1- E < O(y2l0g(N)) & zig = 07- 5) < V2), 
the right hand side follows. We further have that 


loglog(N) 
o( Va" ea) 
1-® ( 2log(N) — ge \ > — /2log(N) 

a. WEIT loso) 


’ 


> log(N) ex ( Se) > Q 
= Wn P 4iog(N) ) = W 


for sufficiently large N, and the left hand side follows. 
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11.2.5.1 Towards Better Detection Properties 


The Bonferroni approach is valid in most generality. Nevertheless, as we have seen in 
Fig. 11.7, if applied pixel-wise the level adjustment (and the resulting increase of the 
threshold) is (much) too strict for our purposes. This is not caused by the Bonferroni- 
adjustment per se, as it can be shown that the detection power of the Bonferroni 
approach cannot be considerably improved in general [7, Sect. 1.4.1]. The issue is 
that we have only considered each single pixel as input for our local tests. Therefore, 
we will extend this from single pixels to larger systems of Rols, which allow to 
“borrow strength from neighbouring pixels”. This makes sense as soon as the signal 
has some structure, e.g., whenever signal appears in (small) clusters or filament-like 
structures. To see this, suppose that for k > 1 we have pı = in =... = Hk = H. 
An uncorrected pixel-wise Z-test would compare each Y; to the threshold z-a, i.e., 
signal in a pixel would be detected if 


Y; = (Yi — u) +H > Z1-a. 
— 
N (0,1) 


This is almost impossible if u is too small or the noise takes a negative value and 
becomes even worse if a multiplicity adjustment is performed. If we instead group 
the first k pixels together and perform a grouped Z-test, i.e., compare F DE Y; to 
Z1—a, a signal would be detected if 


ku +N (0,1) > zia. 


This way, the signal is “magnified” by a factor Vk. Unfortunately, performing, for 
any k, every test that groups k pixels together and thereby incorporating the fact that 
positions į and numbers k of relevant pixels are in general not known in advance, is 
infeasible. However, if the data is clustered spatially we can construct a reasonable 
test procedure that follows a similar path. Instead of performing all tests that group any 
configurations of k pixels, we perform all tests that merge all pixels inak x k square, 
for many different values of k and “scan” the image for signal in such regions in a 
computationally and statistically feasible way. Now the local tests become (locally 
highly) correlated (see Sect. 11.2.6) and a simple Bonferroni adjustment does not 
provide the best detection power any more, although (11.7) is still valid. This will 
be the topic of Sects. 11.2.6 and 11.2.7. 


3One issue is computational limitation. Additionally, this has a systematic statistical burden as then 
tests have to be performed over all possible subsets of the image. For n pixels, these are of size 2”, 
which is a collection of sets such that the resulting error probabilities can no longer be controlled 
in a reasonable way. 
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Amplification of the signal strength by aggregation 
If a signal is spatially grouped in clusters, cluster-wise tests can increase its 
detectability. The average of all signal strengths inside the test region is mag- 


nified by a factor of v size of cluster. 


11.2.6 Scanning 


In a way, the two approaches of aggregating data over the entire image 
(Scenario 1) and performing pixel-wise tests (as done in Scenarios 2-4) are the 
most extreme scenarios. As a rule of thumb, aggregation makes detection easier at 
the cost of losing spatial precision whereas pixel-wise testing provides the highest 
possible spatial precision but makes detection more difficult (after Bonferroni level 
adjustment as we have seen in Scenario 4. Recall that since the tests are independent 
we know that there is no substantially better way to control the FWER). In a next step 
we will combine both ideas. We test on various squares of different sizes to achieve 
accuracy (small regions) where possible and gain detection power (larger regions) 
where the signal is not strong enough to be detected pixel-wise, i.e., on small spatial 
scales. As the system of all subsquares of an image consists of many overlapping 
squares, we have to deal with locally highly dependent test statistics. Table 11.1 
illustrates this effect presenting simulated values of the family wise error rate, based 
on 1000 simulation runs each, with preassigned value a = 0.05. Squares of size 
hxh,he {1, 2,3, 4, 5} in an image of 60 x 60 are considered. The parameter h is 
denoted as a spatial scale. The results of this small simulation study demonstrate that 
the Bonferroni correction is much too strict if we aggregate data in larger squares. The 
following scenario is tailored towards dealing with this specific type of dependency 
structure and is called multiscale scanning. Here, the level adjustment is made in an 
optimal spatially adaptive way, i.e., such that the thresholds are both, large enough 
so that the FWER is controlled but on the other hand so small that smaller thresholds 
can no longer guarantee the control of the FWER. The key is now to exploit that 
the system of all h x h squares fitting into the n x n image is highly redundant. For 
instance, if a square is shifted one pixel to the right, say, both squares share most of 
their pixels and their contents should not be treated as independent. We discussed in 


Table 11.1 Simulated values of FWER at nominal level a = 0.05 for a matrix of local averages 
of h x h pixels 


hxh 1x1 2x2 3x3 4x4 5x5 10 x10 


Observed 
error rate 
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Sect. 11.2.5 that instead of the Bonferroni threshold z,_ a the (1 — a)-quantile of the 
distribution of the maximum of N independent standard normal random variables 


under the global null hypothesis, H, could be used as a threshold as well. This idea 
can be transferred to this setting by using the (1 — a)-quantile of 


se w(h)(Thxh square (Y ) = w(h)), 


where w(h) is a size-dependent correction term, given by 


w(h) = 2m% +7In (au) nz X. (11.10) 


Under H, the quantiles can be simulated as described in Algorithm 1. Recall that 
in Lemma 11.2 it was shown that the quantile z;_» and therefore also the quantile 


of the maximum, Z(j_,)", are approximately of size ,/2 log(N). When pixels are 
aggregated over h x h squares, the corresponding quantiles can be shown to be 
of first asymptotic order ,/2 log(N/h?) (the leading term of w(h) in (11.10), see 
Theorem 11.2 for details), which corresponds to the case of N/h? independent 
tests. This is incorporated into the construction of the thresholds as described in 
Algorithm 1. 


Algorithm 1: Simulation of the thresholds 


Parameters : Number of Monte-Carlo runs M € N, largest size hmax € N, significance 
level a € (0, 1) 

1 forn = 1,2,..., M do 

Draw i.i.d. data Y; ~ N (0, 1) for 1 <i < n; 

for 1 < h < hmax do 
Compute all test statistics Th xh square (Y); 
Compute all w(h)(Thxn square(Y ) — w(h)); 
Save their maximal value in q}; 

7 Set t; := MaX1<h<hmax Vhs 

8 Sort the values ¢; such that ı < ... < ty; 

9 Choose j € {1, ..., M} such that j/M < a < (j + 1)/M; 

10 Setq"_ = t;/w(h) + w(h); 


l-a 


au & wh 


In line 12 of Algorithm 1, the size-dependent thresholds Gt = t;/w(h) + w(h) 
are defined. Comparing each Taxh square(Y) to Ge yields a multiplicity adjusted 
multiple test procedure. Note that in Algorithm 1 the quantile of the maximum over 
all, locally correlated, test statistics under the global null hypothesis is approximated. 
This way, the dependence structure is taken into account precisely. 


Scenario 5 (Unknown position, multiscale scanning) We now aggregate test results 
for several different scanning tests. We consider testing each pixel, as well as testing 
each 2 x 2,3 x 3, 4 x 4 and 5 x 5 square. In total these are 16.830 tests. We now 
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Table 11.2 Scale dependent quantiles for the scanning test with windows of variable sizes 


a di-a dia dla dia dia Bonferroni for 
16.830 tests 

0.1 5.115 4.760 4.531 4.345 4.208 4.380 

0.05 5.267 4.921 4.698 4.527 4.385 4.528 

0.01 5.581 5.2538 5.043 4.883 4.750 4.875 


adjust the level in a way that accounts for local correlations. We fix a = 0.05 and 
calculate all test statistics Th, xh square (Y ) (see (11.5)). The local hypotheses Ah, xn square 
are 


Hnxn square : “Hi = O inh x h square.” (11.11) 


Each Thxh square(Y) is compared to the size-dependent thresholds gh which 
have been generated according to Algorithm 1 and are listed in Table 11.2. We reject 
the local hypotheses that there is no signal in a particular h x h square if the corre- 
sponding test statistic is larger than the threshold, that is, if 


1 
Th xh square (Y ) = h > Y; > RE (11.12) 


hxh square 


All significant squares are stored and finally, after all square-wise comparisons 
have been made, for each pixel, the smallest square that was significant is plotted. 
Findings for the different sizes are color-coded and for each pixel the color corre- 
sponding to the smallest square in which signal was detected is plotted. The results 
are shown in Fig. 11.8. One big advantage of this approach is that also the weak sig- 
nal is now completely included in the segmentation in contrast to even the unadjusted 
approach of Scenario 2 (compare the lower left plots of Figs. 11.4 and 11.8). Also, the 
color-coding visualizes regions of strong signal and therefore contains “structural 
information” on the data. 


The procedure in Scenario 5 is such that the FWER is still controlled in a strong 
sense, although the thresholds can be chosen smaller than in a Bonferroni approach. 
This is much more so if N and h get larger, but is visible starting from h = 4, which 
matches the values given in Table 11.1. This was possible due to the strong local 
correlations between tests. Roughly speaking, for each size of the moving window a 
Bonferroni-type adjustment is made for the (maximum) number of non-overlapping 
squares of that size which is a considerable relaxation. Remarkably, the prize for 
including many different sizes is extremely small. More theoretical details can be 
found in Sect. 11.2.7. 

To conclude this section, it should be stressed that in many situations, we do not 
encounter rectangular signals, however, small rectangles can be considered as build- 
ing blocks for more complex structures. If specific shape information is available, 
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Fig. 11.8 Noisy signals (upper row) and the corresponding test results from Scenario 5 (lower 
row). Significant 5 x 5 squares are plotted in yellow. Significant 1 x 1-4 x 4 squares are plotted 
in green with increasing brightness. For each pixel, the smallest square which was found significant 
was plotted. Insignificant regions are coloured in blue 


this can be incorporated into the testing procedure as long as the regions are not 
too irregular and the set of regions satisfies a Vapnik-Cervonenkis-type complexity 
condition (see [8] for more details). The literature on multiscale scanning methods 
is vast. In the particular context of imaging, the reader may also consult [9-12] for 
related ideas. 


Multiscale Scanning 

With probability guarantee of 1 — a all of the Rols chosen in the multiscale 
scanning procedure described in Scenario 5, are valid. Hence, we obtain local- 
ized Rols where the signal is sufficiently strong and profit from aggregation, as 
described in Sect. 11.2.5.1, where the signal is weak and point-wise detection 
is too difficult. 


11.2.7 Theory for the Multiscale Scanning Test 


The following theorem is the theoretical foundation for Scenario 5. 


Theorem 11.2 Assume that an n x n array of independent N (ui, 1) variables is 
observed and H C {1,...,n} is a set of side lengths of squares. Denote for h €E H 
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by S(h) the set ofall h x h-squares. Let N = n?, w(h) as defined in (11.10) and let 
further Ji—a denote the 1 — a-quantile of 


max max w(h)(Ts(¥) — w(h)) (11.13) 


under the global hypothesis H :“no signal in any of the squares”. Reject each hypoth- 
esis Haxh square (See (11.11)) for which 


I-a 
w(h) 


T(R) > + w(h). (11.14) 


(i) This yields a multiple test for which the FWER at level a is controlled asymp- 
totically (as |H|/n — 0, n — oo) in the strong sense. 
(ii) This test is minimax optimal in detecting sparse rectangular regions of the signal. 


Claims (i) and (ii) follow from Theorems 7 and 2 in [1]. Roughly speaking, the essence 
of the previous theorem is that we only need multiplicity control for approximately 
n?/h? (corresponding to the number of independent) tests instead of (n — h + 1)? 
(corresponding to the actual number of all) tests. Control of the FWER in the strong 
sense means that all significant squares can be used in the final segmentation (lower 
row of Fig. 11.8). 


In this chapter we mainly focused on control of the FWER, however weaker means 
of error control are of interest as well. A very prominent one is the false discovery 
rate (FDR, [13]), which we briefly discuss in Sect. 11.2.9. 


11.2.8 Deconvolution and Scanning 


In photonic imaging additional difficulties arise. Firstly, we have to deal with non- 
Gaussian and non i.i.d. data (see Chap.4), e.g., following a Poisson distribution 
with inhomogeneous intensities \;. Then, as long as the intensity is not too small, 
a Gaussian approximation validates model (11.2) as a reasonable proxy for such 
situations. A formal justification for the corresponding multiscale tests is based on 
recent results by [14], for details see [1]. The price to pay for such an approximation 
is a lower bound on the sizes of testing regions that can be used, due to the fact that 
several data points (of logarithmic order in n) need to be aggregated so that a Gaussian 
approximation is valid. For ease of notation, we only discussed the Gaussian case in 
Sect. 11.2.7, generalizations to other distributions can be found in [8]. 

Secondly, convolution with the PSF of the imaging device induces blur. The first 
row of Fig. 11.9 shows the convolved synthetic images that were shown in the upper 
row of Fig. 11.2, where the images in the central row are noisy versions of these 
convolved images. Note, that some structures are no longer identifiable by eye after 
convolution. When applying the multiscale scanning approach in Scenario 5 naively 
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Fig. 11.9 Signals after convolution (upper row), noisy version (central row) and the corresponding 
test results from Scenario 5, naively applied to the convolved data (lower row). Significant 5 x 5 
squares are plotted in yellow. Significant 1 x 1 — 4 x 4 squares are plotted in green with increasing 
brightness. For each pixel, the smallest square which was found significant was plotted. Insignificant 
regions are coloured in blue 


to the convolved data (central row of Fig. 11.9). The result (lower row of Fig. 11.11) 
demonstrates that this is indeed not a competitive strategy and it strongly suggests 
to take the convolution into account. 

We now briefly sketch how to adapt the multiscale scanning procedure 
(Scenario 5) to the convolution setting. Notice that in the case of data (11.2), we 
can write the test statistic (11.5) for a particular square S as 


Ts(Y) = (Zs, Y), 


where Y = (Y,,..., Y„) denotes the data vector and Zs denotes the scaled indicator 
function on S, i.e.,Zs(j) = 1/./|S| if j € S and 0 else. Now, the indicator functions 
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are considered as a system of probe functions, which are tested on the data Y. In case 
of convolution with the PSF k (e.g. of the microscope), model (11.2) turns into 


Y*=(uekj+e;, i=1,...,n (11.15) 


where “x” denotes convolution. The goal is to find a probe function, acting on the 
convolved data, denoted as Z% such that 


(Zs, ¥*) ~ (Ts, Y), 


that is, Z% should locally deconvolve. Let a = (41, . . . Hn). Then, if F denotes the 
discrete Fourier transform, by Plancherel isometry and the convolution theorem 


(Ts, u) = (F (Ze) pk). 


This means that (provided Fk Æ 0) 


*  gp-l[ FI; 
TSF (3) (11.16) 
is a reasonable choice of a probe system for the data (11.15) and a statistic that adapts 
to the convolution is given by 


TS. 


Scenario 5 can now be performed, following Algorithm | to derive suitable thresh- 
olds, replacing Zs by Z% and the FWER is controlled. More precisely, it can be shown 
that Theorem 11.2 also applies in this scenario (see [1] for details). Figure 11.10 d 
shows the result of this adapted test procedure (MISCAT) applied to our original 
data (Fig. 11.10 a). As a comparison, we also applied Scenario 5 naively to the data 
set (Fig. 11.10f). Analogously to [15], Zs can be chosen such that MISCAT with Z% 
performs optimally in terms of detection power. 


Deconvolution and scanning 

In convolution problems sums of pixel values over spatial regions (e.g. squares) 
will be replaced by probe functionals over the pixels (weighted sums) which 
can be designed in an optimal way for a given convolution K. The resulting 
multiscale test scans over all probe functionals which results in substantially 
more precise segmentation results (for a direct comparison see lower left and 
lower right panel of Fig. 11.10). It still controls the FWER. 
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Ä || # # 


(b) zoomed data, (1) (c) zoomed data, (2) 


be 


(d) MISCAT (e) single scale dec. test (f) multisc. direct test 


Fig.11.10 (Figure 2 in [1]) Experimental data and corresponding 90% significance maps computed 
by different tests. The color-coding of the significance maps always show the size of smallest 
significance in nm?, cf. the main text. a-c data and zoomed regions, d MISCAT, e a single scale 
test with deconvolution, f a multiscale scanning test without deconvolution 


11.2.9 FDR Control 


As discussed in the previous sections of this chapter, as the sample size increases 
(and therefore the number of tests), the control of the FWER becomes more difficult 
and thus this may result in low detection power, e.g., in three dimensional imaging. 
Therefore, a strategy to obtain less conservative procedures of error control is to relax 
the FWER. The most prominent relaxation is the false discovery rate (FDR [13]), 
defined as 


FDR = 


#false rejections 
“| max{#all rejections, 1} |’ 


that is, the average proportion of false rejections among all rejections. Hence, in 
contrast to the FWER this criterion scales with the number of rejections. The control 
of the FDR is a weaker requirement than the control of the FWER in general. Pro- 
cedures that control the FDR are often written in terms of p-values. In the situation 
of the Z-test with test statistics Taxh square(Y) as in (11.12) the p-values are given as 


Phxhsquare = 1-® (Tnxhsquare (Y )) ’ (11.17) 
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Fig. 11.11 Noisy signal (left) test result of pixel-wise tests after Bonferroni adjustment (middle) 
and test results from Scenario 6 (right) with FDR control. In both multiple testing procedures 
a = 0.05. Significant pixels are marked green, insignificant regions are coloured in blue 


where ® denotes the cumulative distribution function of the standard normal distri- 
bution. The smaller the p-value, the stronger the evidence that the null hypothesis 
should be rejected. 


Benjamini-Hochberg Procedure ([13]) Consider a multiple testing procedure con- 
sisting of independent tests with p-values p,,..., py. Sort the p-values increasingly, 
Pay < Pea)--- < paw), and reject all null hypotheses for which pi < ax, where 
k = max{k | pæ < ak/N}. 

Reference [16] already proposed the above procedure but pointed out that this 
approach lacks a theoretical justification, which has been given by [13], who showed 
that FDR < Ma, where No denotes the number of true null hypotheses. 


Scenario 6 (Benjamini-Hochberg (BH) Procedure) In the situation of Scenario 3, 
we also performed a BH procedure for all 60 x 60 = 3600 entries of the third test 
image (see left panel of Fig. 11.11). The result is displayed in the right most panel 
of Fig. 11.11, while in the centre, for a comparison, the result of the Bonferroni 
procedure on the same data set is displayed. Obviously, more parts of the signal 
have been found, however, still several positives are missed and a false discovery is 
included. 


There is a vast literature on FDR control and many generalizations have been 
proposed. For instance, if x is much smaller than 1, corresponding to the case of a 
non-sparse signal, the procedure controls the FDR at much smaller level than a and 
refined versions of the BH procedure in which No/N is estimated from the data have 
been proposed (see, e.g., [17, 18] and the references given therein). 

While the BH procedure grants control of the FDR in test Scenarios 2 and 3 due 
to independence between pixels, the situation in Scenario 5 is more delicate due to 
the strong local correlations, in particular in the presence of convolution, where a 
suitable FDR-procedure is still an open problem and currently investigated by the 
authors. We stress that while FDR-control under specific dependency structures has 
been investigated by many authors, e.g., [19, 20] and the references given therein. 
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Non of the existing methods provide a procedure tailored to deconvolution problems 
as they occur in photonic imaging. The construction of such adjusted methods is a 
worthwhile focus for future research. 


11.3 Statistical Multiscale Estimation 


If one is further interested in the recovery (estimation) of the unknown signal, the 
multiscale testing procedure developed in Sects. 11.2.6 and 11.2.8 actually provides 
a collection of feasible candidates for this task in the sense that all signals which 
fall in the acceptance region of one of the afore-mentioned tests can be considered 
as “likely” as they cannot be rejected by such a scanning test. More precisely, if we 
assume model (11.15), any signal which satisfies 


x vr y = * 
max a wch)((T3, Y"— u * k) w(h)) < qio (11.18) 


cannot be rejected. Here H and S(h) are defined in Theorem 11.2, and qj_,, is the 
(1 — a)-quantile of the left hand side of (11.18) with (Y* — jt x k) being replaced by 
noise £, w(h) is the scale correction term given in (11.10), and Z5 is as in (11.16). 
Among all the candidates & lying in (11.18), we will pick the most regular esti- 
mate. This is done by means of a (convex) functional S(-), defined on a domain D 
for u, which encodes prior information about the unknown signal, e.g. sparsity or 
smoothness. Thus, the final estimator ĝ is defined as 


Be argminS(f) subject to ñ satisfies (11.18). (11.19) 
m 


Because of the choice of qf, we readily obtain the regularity guarantee 
P, (s < sw) >1-a uniformly overallu € D, 


i.e., the resulting estimator is at least as regular as the truth with probability 1 — a, 
whatever the configuration p of the truth is. Furthermore, the remaining residuum 
Y* — ĥ xk is accepted as pure noise by the multiscale procedure described in 
Sect. 11.2.8. 

Before we discuss possible ways how to solve the minimization problem (11.19), 
note that (T, Y— üx k) = (TX, Y) — (Zs, A) in (11.18), and hence the computation 
can be sped up by avoiding convolutions between yz and k. Next we emphasize that 
the discretization of (11.19) has the form 


argminS(j) subject to Aà < Ki < À, (11.20) 
7 
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where A, À are vectors, K a matrix, and “<” acts element-wise. Thus, whenever S is 
convex, the whole problem is convex (but, however, non-smooth) and can be solved 
by many popular methods. In Algorithm 2 we give one possibility which arises from 
applying the primal-dual hybrid gradient method [21] to an equivalent reformulation 
of the first order optimality conditions of (11.20) (which are necessary and sufficient 
by convexity). 


Algorithm 2: Primal dual hybrid gradient method for (11.20) 


Parameters : Set o,7 > 0 s.t. 07||K ||? < 1, and 8 € [0, 1] 
Initialization: Set Zo = po € D(K) and vo € R(K) 

1 forn = 1,2,...do 

Un = Vn- + OK py_-13 

Vy = max{v, — oA, 0} + min{v, — oA, O}; 

Hn = arg ming, + | — (Hn-1 — TK*p) ||? + S); 

Bin = Un + 0 (Hn — Hn-1); 


2 
3 
4 
5 


Algorithm 2 relies on efficient computations of the so-called proximal operator of 
S, see line 4. In most cases, it has either an analytic form ifS is €?-norm(1 < p < oo), 
or an efficient solver if S is the total variation semi-norm [22]. 

One alternative to Algorithm 2 is the alternating direction method of multipliers 
(ADMM), which can be applied directly to (11.20) and is compatible with any 
convex functional S [23]. However, Algorithm 2 avoids the projection onto the 
intersection of convex sets, and turns out to be much faster in practice if step 4 in 
Algorithm 2 can be efficiently computed. For further algorithms relevant for this 
problem, see Chaps. 6 and 12. 

We stress that a crucial part of the estimator f in (11.18)-(11.19) is the choice of 
probe functionals Z from Sect. 11.2.8. In Fig. 11.12, this estimator ĝ is referred to as 
MiScan(short for multiscale image scanning), whereas MrScan(short for multiscale 
residual scanning) denotes the estimator of a similar form as ĝ but with Z% being 
replaced by Zs see [23-26] i.e., the convolution is not explicitly taken into account 
in the probe functional. MiScan recovers significantly more features over a range of 
scales (i.e., various sizes) compared to MrScan. 

There is good theoretical understanding on the estimator & by (11.18)-(11.19) for 
the regression model (11.2), that is, k = ôo, the Dirac delta function, in model (11.15). 
In case of S being Sobolev norms, [27] shows the minimax optimality of & for Sobolev 
functions for fixed smoothness, and [28] further show the optimality over Sobolev 
functions with varying smoothness (adaptation). In case of S being the total variation 
semi-norm, [29] show the minimax optimality of such an estimator for functions 
with bounded variation. All the results above are established for L?-risks (1 < p < 
oo). For the more general model (11.15), [30] provide some asymptotic analysis 
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MiScan MrScan 


Fig. 11.12 Comparison on a deconvolution problem (SNR = 100, and the convolution kernel k 
satisfying Fk = 1/(1 + 0.09|| - ||?)). MiScanis defined by (11.18)-(11.19); MrScanis similar to 
MiScanbut with Z% replaced by Zs; For both methods, the regularization functional S is chosen as 
the total variation semi-norm 


with respect to a relatively weak error measure, the Bregman divergence. A detailed 
analysis of MiScan exploring the probe functionals in (11.16) in a convolution model 
is still open and currently investigated by the authors. 
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Chapter 12 A) 
Efficient, Quantitative Numerical coe fx 
Methods for Statistical Image 
Deconvolution and Denoising 


D. Russell Luke, C. Charitha, Ron Shefi and Yura Malitsky 


Abstract We review the development of efficient numerical methods for statistical 
multi-resolution estimation of optical imaging experiments. In principle, this involves 
constrained linear deconvolution and denoising, and so these types of problems can be 
formulated as convex constrained, or even unconstrained, optimization. We address 
two main challenges: first of these is to quantify convergence of iterative algorithms; 
the second challenge is to develop efficient methods for these large-scale problems 
without sacrificing the quantification of convergence. We review the state of the art 
for these challenges. 
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12.1 Introduction 


In this chapter we review progress towards addressing two main challenges in sci- 
entific image processing. The first of these is to quantify convergence of iterative 
algorithms for image processing to solutions (as opposed to optimal values) to the 
underlying variational problem. The second challenge is to develop efficient methods 
for these large-scale problems without sacrificing the quantification of convergence. 
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The techniques surveyed here were studied in [1-3]. We present only the main results 
from these studies, in the context that hindsight provides. 

Scientific images are often processed with software that accomplishes a number 
of tasks like registration, denoising and deblurring. Implicit in the processing is 
that some systematic error is being corrected to bring the image closer to the truth. 
This presumption is more complicated for denoising and deblurring. These are often 
accomplished by filtering or by solving some variational problem such as minimizing 
the variance of an image. For applications requiring speedy processing, such as audio 
and video communication, this is sufficient. But the recent development of nanoscale 
photonic imaging modalities such as STED and RESOLFT featured in Chaps. 1, 7 
and 9 has shifted the focus of image denoising and deconvolution from qualitative 
to quantitative models. 

Quantitative approaches to image processing are the subject of Chap. 11 where 
statistical multiscale estimation is discussed (see Sect. 11.3). Here, the recovered 
image comes with statistical statements about how far the processed image is, in some 
statistical sense, from the truth. The estimators are almost exclusively variational, 
that is, they can be characterized as the solution to an optimization problem. It is 
important to emphasize that the value of the optimization problem is meaningless. 
This stands in stark contrast to many conventional applications in economics and 
operations research, where the value of the optimal solution is related to profit or 
cost, and so is of principal interest. 

The focus on optimal solutions rather than optimal values places heavy demands 
on the structure of model formulations and the algorithms for solving them. Unless 
the numerical method allows one to state how far a computed iterate is to the solution 
of the underlying variational problem, then the scientific significance of the iterate 
is lost. 

The leading computational approaches for solving imaging problems with multi- 
resolution statistical estimation criterion are based on iterated proximal operators. 
Most of the analysis for first-order iterative proximal methods is limited to statements 
about rates of convergence of function values, if rates are discussed at all (see for 
instance [4—7]). First-order methods have slow convergence in the worst case sce- 
nario. A common assumption to guarantee linear convergence of the iterates is strong 
convexity, but this is far more than is necessary, and in particular it is not satisfied 
for the Huber function (12.35). It was shown in [8] that metric subregularity is nec- 
essary for local linear convergence. Aspelmeier, Charitha and Luke [1] showed that 
the popular alternating directions method of multipliers algorithm (ADMM) applied 
to optimization problems with piecewise linear-quadratic objective functions (e.g. 
the Huber function), together with linear inequality constraints generically satisfies 
metric subregularity at isolated critical points; hence linear convergence of the iter- 
ates for this algorithm can be expected without further ado. More recently, in [3] it 
was shown that the primal iterates of a modification of the PAPC algorithm (Algo- 
rithm 2) converge R-linearly for any quadratically supportable objective function 
(for instance, the Huber function). Conventional results without metric subregularity 
obtain a convergence rate of O(1/k) with respect to the function values. In settings 
like qualitative image processing or machine learning, such results are acceptable, 
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but in the setting of statistical image processing these statements do not contain any 
scientific content. We present in this chapter efficient iterative first-order methods 
that offer some hope of quantitative guarantees about the distance of the iterates to 
optimal solutions. 


12.2 Problem Formulation 


We limit our scope to the real vector space R” with the norm generated from the 
inner product. The closed unit ball centered on the point y € R” is denoted by 
B(y). The positive orthant (resp. negative orthant) in R” is denoted by R", (resp. 
R"_). The domain of an extended real-valued function y : R” > R = (-00, +00] 
is dom (p) = {z € R” : p(z) < +00}. The Fenchel conjugate of y is denoted by y* 
and is defined by y*(u) = sup, eg {(z, u) — Y(z)}. The set of symmetric n x n pos- 
itive (semi)-definite matrices is denoted by S’ _ (S7). The notation A > 0 (A > 0) 
denotes a positive (semi)definite matrix A. For any ze R” and any A € S!, we 
denote the semi-norm Izlã := (z, Az). The operator norm is defined by || A|| = 
max,er{||Au|| : ||ul| = 1} and coincides with the spectral radius of A whenever A 
is symmetric. If A Æ 0, Omin(A) denotes its smallest nonzero singular value. For a 
sequence {z*}zen converging to z*, we say the convergence is Q-linear if there exists 
c € (0, 1) such that l =z] 


llz*—z* I| 
sequence 7 such that ||z* — z*|| < nę and nk — 0 Q-linearly [9, Chap. 9]. 

We limit our discussion to proper (nowhere equal to —oo and finite at some point), 
lower semi-continuous (Isc), extended-valued (can take the value +00) functions. We 
will, in fact, limit our discussion to convex functions, but convexity is not the central 
property governing quantitative convergence estimates. By the subdifferential of a 
function y, denoted Oy, we mean the collection of all subgradients that can be 
written as limits of sequences of Fréchet subgradients at nearby points; a vector v is 
a (Fréchet) subgradient of y at y, written v € Ov(y), if 


< c for all k; convergence is R-linear if there exists a 


nr PR — PO) — (v, x — y) 
lim inf 
roy. a lx < yl 


> 0. (12.1) 


The functions of interest for us are subdifferentially regular on their domains, that 
is, the epigraphs of the functions are Clarke regular at points where they are finite 
[10, Definition 7.25]. For our purposes it suffices to note that, for a function ¢ that is 
subdifferentially regular at a point y, the subdifferential is nonempty and all subgra- 
dients are Fréchet subgradients, that is, Op(y) = Ov(y) # Ø. Convex functions, in 
particular, are subdifferentially regular on their domains and the subdifferential has 
the particularly simple representation as the set of all vectors v where 


p(x) — ply) — (uv, x —y) 20 Vx. (12.2) 
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Amapping® : R” = R” issaid to be J-inverse strongly monotone [10, Corollary 
12.55] if for all x, x’ € R” 


(v—v',x —x’) > Bllv—v'|*, whenever v € D(x), v € D(x’). (12.3) 


The mapping © is said to be polyhedral (or piecewise polyhedral [10]) if its graph is 
the union of finitely many sets that are polyhedral convex in R” x R” [11]. Polyhedral 
mappings are generated by the subdifferential of piecewise linear- quadratic functions 
(see Proposition 12.9). 


Definition 12.1 (piecewise linear-quadratic (plq) functions) A function f : R" > 
[—oo, +00] is called piecewise linear-quadratic if domf can be represented as the 
union of finitely many polyhedral sets, relative to each of which f(x) is given by an 
expression of the form 5 (x, Ax) + (a, x) + a for some scalar a € R vector a € R”, 
and symmetric matrix A € R”*”. 


Closely related to plq functions is quadratically supportable functions. 


Definition 12.2 (pointwise quadratically supportable (pqs)) A proper, extended- 
valued function y : R” — RU {+00} is said to be pointwise quadratically support- 
able at y if it is subdifferentially regular there and there exists a neighborhood V of 
y and a constant u > 0 such that 


(Vv € Ov(y)) v(x) = py) + (vy, x — y) + 5 lx- yl, WeeVv. (124) 


If for each bounded neighborhood V of y there exists a constant u > O such that 
(12.4) holds, then the function ¢ is said to be pointwise quadratically supportable 
at y on bounded sets. If (12.4) holds with one and the same constant u > 0 on all 
neighborhoods V, then y is said to be uniformly pointwise quadratically supportable 
at y. 


For more on the relationship between pointwise quadratic supportability, coerciv- 
ity, strong monotonicity and strong convexity see [3]. 

We denote the resolvent of ® by Je = (Id + &)~! where Id denotes the identity 
mapping and the inverse is defined as 


&'(y) = {x eR" |ye B(x) }. (12.5) 


The corresponding reflector is defined by Rye = 27, — Id. One of the more preva- 
lent examples of resolvents is the proximal map. For vy : R” — (—0o, 00] a proper, 
Isc and convex function and for any u € R” and Q € S’, , the proximal map associ- 
ated with y with respect to the weighted Euclidean norm is uniquely defined by: 


. 1 n 
prox o (u) = argmin -{y(z) + 2 lz - ullo :z Ee R”}. 
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When Q = c™!Id, c > 0, we simply use the notation prox, „(u). We also recall the 
fundamental Moreau proximal identity [12], that is, for any z € R” 


Z = proxy (Z) + Oproxg-1 «(OQ '(z)), (12.6) 


where Q`! is the inverse of Q € Sty: 

Notions of continuity of set-valued mappings have been thoroughly developed 
over the last 40 years. Readers are referred to the monographs [10, 11, 13] for 
basic results. A key property of set-valued mappings that we will rely on is metric 
subregularity, which can be understood as the property corresponding to a Lipschitz- 
like continuity of the inverse mapping relative to a specific point. It is a weaker 
property than metric regularity which, in the case of ann x m matrix for instance, is 
equivalent to surjectivity. Our definition follows the characterization of this property 
given in [11, Exercise 3H.4]. 


Definition 12.3 (metric subregularity) The mapping © : R” = R” is called met- 
rically subregular at x for y relative to W C R" if (x, y) € gph@ and there is a 
constant c > 0 and neighborhoods O of x such that 


dist(x, ®-!(¥) N W) < cdist(y, B(x) Vx €e ONW. (12.7) 


The constant c measures the stability under perturbations of inclusion y € ®(x). 
An important instance where metric subregularity comes for free is for polyhedral 
mappings. 


Proposition 12.4 (polyhedrality implies metric subregularity) Let W C V be an 
affine subspace and T : W = W.. IfT is polyhedral and Fix T N W is an isolated 
point, {x}, then T — Id: W = (W — x) is metrically subregular at x for 0 relative 
to W. 


Proof Polyhedrality and isolated fixed points in fact imply strong metric subregu- 
larity. See [11, Propositions 31.1 and 31.2]. 


A notion related to metric regularity is that of weak-sharp solutions. This will be 
used in the development of error bounds (Theorem 12.6). 


Definition 12.5 (weak sharp minimum [14]) The solution set argmin {f (x) |x € 2} 
for a nonempty closed convex set §2, is weakly sharp if, for p = inf o f, there exists 
a positive number a (sharpness constant) such that 


f(x) > P +a dist(x, Sp) Vx e2. 


Similarly, the solution set 5; is weakly sharp of order v > 0 if there exists a positive 
number a (sharpness constant) such that, for each x € 2, 


f(x) = P+ adist(x, Sp)” Yx eR. 
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12.2.1 Abstract Problem 


The generic problem in which we are interested is 


minimize f(x) 
xeR” 


À ; (Po) 
subject to gi(Aix) <0 (i=1,...M) 


The following blanket assumptions on the problem’s data hold throughout: 


Assumption 1 


(i) The set of optimal solutions for problem (Po), denoted S*, is nonempty. 
(ii) the function f : IR" — R is proper lsc and convex and coercive, and for i = 
1,..., M the functions g; : R"' — (—oo, +00] are proper, lsc, and convex; 
(iii) the mappings A; : R” > R", i = 1,..., M (m; < n) are linear and full 
rank, that is, 02, (A;) = Amin (Ai AT) > 0. 


min 


Assumption (i) implies that the optimal value of (Po) is finite. Assumption (ii) 
implies that the constraint structure is convex. Assumption (iii) implies that the 
mapping A : R” — R” is linear and full rank, where 


A=[Aj, Af,..., Ay] € R” xR” 


so that Ax = y where y = (yı,..., ym) € R” form = aan m;. The challenge of 
statistical multi-resolution estimation lies in the feature that the dimension of the 
constraint structure, m, is much greater than the dimension of the unknowns, n, and 
grows superlinearly with respect to the number of unknowns. 

The above constrained optimization problem is often formulated as an 
unconstrained-looking problem via the introduction of a (nonsmooth) penalty term 
enforcing the constraints: 


min f(x) + 9A) (P) 


where 
g: R” > R = p0 (12.8) 


for p a positive scalar and 
0 : R” — [0, +00] proper, lsc and convex with 0(y) = 0 if and only if y € C 
(12.9) 


where 


C = {y = (y1, Y2, -.-, Ym) | y € range A and g; (yi) < 0, i = 1,2,..., M}. 
(12.10) 
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The requirements on the function @ align this penalty term with exact penalization 
[15], that is, a relaxation of the constraints where, for all parameters p large enough, 
the constraints are exactly satisfied. 

The following assumptions are used to guarantee the exact correspondence 
between solutions to (Po) and (P). 


Assumption 2 


(i) The set So. =argmin {9(Ax) |x € R"} is nonempty and weakly 
sharp (Definition 12.5) of order v > 1. 
(ii) The lower level setlev <a f is bounded for eacha € Randinf,cer f > —X. 


Proposition 12.6 Suppose Assumption 2 holds. Then the set of solutions to (P ) with 
g defined by (12.8) is bounded and for all p large enough, the solutions to (Po) and 
(P) coincide. 


Proof This is a distillation of [1, Theorem 3.4]. 


In (Po) and (P) the function f is often smooth, but not prox friendly. In applica- 
tions it is most often a smooth regularization or a fidelity term. For the ADMM/DR 
method reviewed in Sect. 12.3 smoothness is not required. 

Itis assumed that the functions g; (i = 1, 2, ..., M) are prox friendly and that they 
enjoy some structure that makes g also prox friendly. For instance, if the constraints 
are separable, then the function 


M 
g(Ax) = p>. gi(Aix) (12.11) 


i=1 


is also prox-friendly as is the function 
g(Ax) = pmaxl{gı (Aıx), 92(A2x), ..., gu (Amx), 0}. (12.12) 


The functions g; o A; can be regularizing functions (like total variation) or hard 
inequality constraints. For example, hard inequality constraints are modeled by the 
use of indicator functions for g; in (Po): gi(A:x) = 44,8 (Aix — bi) where, for a 
subset 2 C R”, 

0 xE 


to(x) = 
a) +oo else. 


12.2.2 Saddle Point and Dual Formulations 


The saddle point formulation is derived by viewing the function g in (P) as the 
image of a function g* under Fenchel conjugation, that is, g(x) = (g*)*. Writing this 
explicitly into (P) yields 
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min (mas f (x) + (Ax, y) - so}. (S) 
xeR" | yeR” 


The bifunction in the saddle point formulation is 


L(x, y) = f(x) + (Ax, y) - g“). (12.13) 


Contrast this with the Lagrangian for the extended problem 


minimize f(x)+g(y), subject to Ax = y. (Pc) 
(x,y)ER" xR” 
The Lagrangian is 
L(x, y, 2) = f(x) + gO) + (z, Ax — y), (12.14) 


and the augmented Lagrangian £ for (Pc) is given by 
Lx, ¥,2) = fŒ) +90) + (z, Ar — y) + 3 Ax — yl’, (12.15) 


where z € R”, and ņ > 0 is a fixed penalty parameter. 
Assumption 1(i) guarantees that the mapping L(., -) has a saddle point, that is, 
there exists (x, $) € R” x R” such that 


LR, y) < LR, $) < Lœ, $) Yx eR", y eR". 


The existence of a saddle point corresponds to zero duality gap for the induced 
optimization problems 


p(x) = sup{Z(x, y):yeR”} qg(y)= inf{L(, y):x € R’}. 


By weak duality, we have infer p(x) 2 SUp,crn g(y). 
This can be viewed as a partial dual to problem (P). The full dual problem involves 
the Fenchel conjugate of the entire objective function. For (P) the dual problem is 


sup - f*(A' y) — g* (=y). (D’) 


yeR” 


Instead of working with this dual, it is more convenient to work with the following 
equivalent formulation via the change of variable y > —y: 


inf f° (—ATy) + 9°). (D) 


Under standard constraint qualifications (e.g., [16, Theorem 2.3.4]), (x, y) isa saddle 
point of L if and only if x is an optimal solution of the primal problem (Po), and y is an 
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optimal solution of the dual problem (D). The following two inclusions characterize 
the solutions of the problems (Po) and (D) respectively: 


0 € Of) +0gG0 A)T); 
0 € ð (f* o (—A’)) O) + 8g 0). 
In both cases, one has to solve an inclusion of the form 
0e(B+D)(x), (12.16) 


for general set-valued mappings B and D. 


12.2.3 Statistical Multi-resolution Estimation 


Statistical multi-resolution estimation (SMRE) discussed in Sect. 11.2.7 of Chap. 11 
is specialized here for the case of imaging systems with Gaussian noise. Let G = 
{1,..., N} x {1,..., N} denote a grid of N? = n points. Denote by V a collection 
of subsets of G. Suppose there are M such subsets, that is M = |V], each of these 
subsets V; consisting of m; < n grid points with Ea m=m>n. 

The variational model for statistical multi-resolution estimation with Gaussian 
noise takes the form 


minimize fœ) 
xeR” 
subject to a w;(j) ((Ax); - b)| Se Werd... 
(Psmre) 
Here f : R” — R is a regularization functional, which incorporates a priori knowl- 
edge about the unknown signal x such as smoothness, w; is a weighting function for 
the grid points in the subset V;, and A : R” — R” is the linear imaging operator that 
models the experiment. The constant ~y; has an interpretation in terms of the quantile 


of the estimator. 
In the context of the general model (Po), 


gi (Aix) = |) wG) (Ax); -b,)| =, i=1,2,...,M. (12.17) 


jevi 


Here the affine mapping A;x = ey w;(j) ((Ax); _ bj) is an averaging operator 
that accounts for sampling at different resolutions of the image. Note that the obser- 
vation b need not be in the range of the imaging operator A - all that is assumed is 
that this mapping is injective, not surjective. This means that, in applications, prac- 
titioners need to be careful not to make the constraint y; too small, otherwise the 
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optimization problem might be infeasible. If the algorithms presented below appear 
to be diverging for a particular instance of (Psmre), it is because the problem is 
infeasible; increasing the constants y; should solve the problem. 


12.3 Alternating Directions Method of Multipliers 
and Douglas Rachford 


In this section we survey the main results (without proofs) from [1]. For proofs 
of the statements, readers are referred to that article. A starting point for most of 
the main approaches to solving (Po) is the alternating directions method of mul- 
tipliers (ADMM) (primary sources include [17-21]). This method is one of many 
splitting methods which are the principal approach to handling the computational 
burden of large-scale, separable problems [22]. ADMM belongs to a class of aug- 
mented Lagrangian methods whose original motivation was to regularize Lagrangian 
formulations of constrained optimization problems. The ADMM algorithm for solv- 
ing (Pz) follows. The penalty parameter n need not be a constant, and indeed evidence 
indicates that the choice of 7 can greatly impact the complexity of the algorithm. For 
simplicity we keep this parameter fixed. 


Algorithm 1: ADMM for (Pc) as in [1, Algorithm 2.1]. 
Parameters : Set 7 > 0 
Initialization: Set (x°, y°, z?) € R” x R” x R” 

1 fork = 1,2,...do 

2 | x! © argmin, { f(x) + (z*, Ax) + ZI] Ax — y*?}; 

3 | yet! © argmin y {g(y) — (z*, y) + $l] Axt! — yll? }; 

4 zkt! = zk ae n(Axk+1 ay yktl), 


We do not specify how the argmin in Algorithm 1 should be calculated, and 
indeed, the analysis that follows assumes that these can be computed exactly.' One 
problem that should be immediately apparent is that this algorithm operates on a 
space of dimension n + 2m. Since one of the two challenges we address is high 
dimension, this expansion in the dimension of the problem formulation should be 
troubling. Nevertheless, we show with this algorithm how the first challenge, namely 
quantification of convergence is achieved. 

The connection between the ADMM algorithm and the Douglas—Rachford algo- 
rithm introduced in Chap. 6, (6.30) was first discovered by Gabay [19] (see also the 
thesis of Eckstein [17]). For any 7 > 0, the Douglas—Rachford algorithm [23, 24] 
for solving (12.16) is given by 


' This is not true in practice and remains an unresolved issue in numerical variational analysis. 
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zt! e T'z* Ken), (12.18) 
for T’ = Ap (Idd — nD) +D), (12.19) 


where 7, = (Id + nD)! and Jng Ud + nB)! are the resolvents of nD and 7B 
respectively. 

Given z° and y? € Dz°, following [25], define the new variable ¿° = z? + ny? 
so that z? = Tune". We thus arrive at an alternative formulation of the Douglas- 
Rachford algorithm (12.18): 


ele Te (keN), (12.20) 
for T = 1}(RygRyp + 1d) = Inn hp — Id) + Id — Ap), (12.21) 


where R,,p and Ryg are the reflectors of the respective resolvents. This is the form 
of Douglas—Rachford considered in [26]. 
Specializing this to our application yields 


B=0(f*o(-A‘)) and D=0q"*, (12.22) 


and so the resolvent mappings are the proximal mappings of the convex functions 
( fro (-A”)) and g* respectively, and hence the resolvent mappings and corre- 
sponding fixed point operator T are single-valued [12]. 


Proposition 12.7 (Proposition 2.2 [1]) Let f : R” — Rand g : R” — R be proper, 
Isc and convex. Let A: R” — R” be linear and suppose there exists a solution 
to 0 € (B+ D)(x) for Band D defined by (12.22). For fixed n > 0, given any 
initial points €? and (y°, z?) € gphD such that £° = y? + nz, the sequences = 
(E) ey and (y") pey defined respectively by (12.18), (12.20) and yr=oie@- 2) 
converge to points ze FixT’, € € FixT and y € D (Fix T’). The point Z = J,n& 
is a solution to (D), and y = 7 G = z) € DZ. If, in addition, A has full column 


rank, then the sequence (y*, z*) corresponds exactly to the sequence of points 


keN 
generated in steps 2-3 of Algorithm I and the sequence (€+!) 
solution to (Po). 


ken Converges to E, a 


The correspondence between Douglas-Rachford and ADMM in the proposition 
above means that if quantitative convergence can be established for one of the algo- 
rithms, it is automatically established for the other. Linear convergence of Douglas- 
Rachford under the assumption of strong convexity and Lipschitz continuity of f 
was already established by Lions and Mercier [26]. Recent published work in this 
direction includes [27, 28]. Local linear convergence of the iterates to a solution 
was established in [29] for linear and quadratic programs using spectral analysis. 
In Proposition 12.8, two conditions are given that guarantee linear convergence of 
the ADMM iterates to a solution. The first condition is classical and follows Lions 
and Mercier [26]. The second condition, based on [30], is much more prevalent in 
applications and generalizes the results of [29]. 
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Proposition 12.8 (Theorem 2.3 of [1]) Let f : R” > Randg : R” — R be proper, 
Isc and convex. Suppose there exists a solution to 0 € (B + D)(x) for Band D 
defined by (12.22) where A : R” — R” is an injective linear mapping. Leté € Fix T 
for T defined by (12.21). For fixed n > 0 and any given triplet of points (£, yo z?) 
satisfying €° = z? + ny°, with y? € Dz°, generate the sequence (y*, z*)ken by Steps 
2-3 of Algorithm 1 and the sequence (£*)xen by (12.20). 

(i) Let O C R” be a neighborhood of Eon which g is strongly convex with con- 
stant u and Og is -inverse strongly monotone for some 3 > 0. Then, for 
any (£, y% z?) € O satisfying € = z° + ny? € O, the sequences (&*)ken and 
(y*, z")ken converge linearly to the respective points € € Fix T and (Y, Z) with 

1 


rate at least K = (1 — 22E, “ei 
(ut) 


(ii) Suppose that T : W —> W for some affine subspace W C R" with fe W. On 
the neighborhood O of € relative to W, that is OM W, suppose there is a 
constant k > 0 such that 


IE — E+] > Vadist(€, FixT) VE Ee ONW, vet e TE. (12.23) 


Then the sequences (fren and (y*, 2) pen converge linearly to the respective 
points £ € Fix T A W and (Y, Z) with rate bounded above by y1 — x. 


In either case, the limit point z = InpE is a solution to (D), Yy € Dz and the sequence 
(x) ae of Step 1 of Algorithm 1 converges to x, a solution of (Po). 


The strong convexity assumption (i) of Theorem 12.8 fails in many applications 
of interest, and in particular for feasibility problems (minimizing the sum of indicator 
functions). By [31, Theorem 2.2], case (ii) of Theorem 12.8, in contrast, holds in 
general for mappings T for which T — Id is metrically subregular and the fixed 
point sets are isolated points with respect to an affine subspace to which the iterates 
are confined. The restriction to the affine subspace W is a natural generalization 
for the Douglas—Rachford algorithm, where the iterates are known to stay confined 
to affine subspaces orthogonal to the fixed point set [32, 33]. We show that metric 
subregularity with respect to this affine subspace holds in many applications. 


Proposition 12.9 (polyhedrality of the Douglas—Rachford operator) Let f : R” > 
R and g : R” — R be proper, lsc and convex. Suppose, in addition, that f and g 
are piecewise linear-quadratic. The operator T : R" — R” defined by (12.21) with 
n > 0 fixed, is polyhedral for B and D given by (12.22) where A: R” > R” isa 
linear mapping. 


Proof This is Proposition 2.6 of [1]. 


Proposition 12.10 (local linear convergence, Theorem 2.7 of [1]) Let f : R” —> R 
and g : R” > R be proper, lsc, convex, piecewise linear-quadratic functions (see 
Definition 12.1). Define the operator T : R” —> R” by (12.21) with n > 0 fixed and 
B and D given by (12.22) where A: R" — R” is a linear mapping. Suppose that 
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there exists a solution to0 € (B + D)(x), thatT : W > W for W some affine sub- 
space of R” and that Fix T N W is an isolated point {€}. Then there is aneighborhood 
O of € such that, for all starting points (€°, y°, z?) with £ = 2° + ny? € ON W for 
y? € D(z?) so that Jene = 2°, the sequence (&*)xen generated by (12.20) converges 
Q-linearly to € where Z = InpE is a solution to (D). The rate of linear convergence 
is bounded above by Y1 — «k, where k = c~* > 0, for c a constant of metric subreg- 
ularity of T — Id at € for the neighborhood ©. Moreover, the sequence (y*, a 
generated by steps 2-3 of Algorithm I converges linearly to (y, Z) withy = - (x — Z), 


and the sequence (x*) ren defined by Step I of Algorithm I converges to a solution 
to (Po). 


12.3.1 ADMM for Statisitcal Multi-resolution Estimation 
of STED Images 


The theoretical results above are demonstrated with an image b € R” (Fig. 12.1) 
generated from a Stimulated Emission Depletion (STED) microscopy experiment 
[34, 35] conducted at the Laser-Laboratorium Göttingen examining tubulin, repre- 
sented as the “object” x € R”. The imaging model is simple linear convolution. The 
measurement b, shown in Fig. 12.1, is noisy or otherwise inexact, and thus an exact 
solution is not desirable. Although the noise in such images is usually modeled by 


Fig. 12.1 Recorded STED image used in [1]. Inset: area to be processed (640 x 640 nm). The 
scale bar (white) is 1 um 
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(a) 


Fig. 12.2 a Numerical reconstruction via Algorithm 1 from the imaging data shown in Fig. 12.1 
for p = 4096. b Iterates of Algorithm 1: primal and dual step sizes, as well as constraint violation 
g(Ax) given by (12.12) with p = 4096. Reproduced from [1] 


Poisson noise, a Gaussian noise model with constant variance suffices as the photon 
counts are of the order of 100 per pixel and do not vary significantly across the image. 
We calculate the numerically reconstructed tubulin density x shown in Fig. 12.2a via 
Algorithm 1 for the problem (P) with the qualitative regularization 


fo) = alix]? (12.24) 


and exact penalty g(Ax) given by (12.12) with g; given by (12.17). 

For an image size ofn = 64 x 64 with three resolution levels the resulting number 
of constraints is m = 12161 (that is, 64? constraints at the finest resolution, 4 * 32? 
constraints at the next resolution and 9 x 21? constraints at the lowest resolution). 
The constant œ = 0.01 in (12.24) is used to balance the contributions of the individ- 
ual terms to make the most of limited numerical accuracy (double precision). The 
constant y; is chosen so that the model solution would be no more than 3 standard 
deviations from the noisy data on each interval of each scale. 

Since this is experimental data, there is no “truth” for comparison - the constraint, 
together with the error bounds on the numerical solution to the model solution provide 
statistical guarantees on the numerical reconstruction [36]. In Fig. 12.2b the iteration 
is shown with the value of p = 4096 for which the constraints are exactly satisfied (to 
within machine precision), indicating the correspondence of the computed solution 
of problem (P) to a solution to the exact model problem (Po). 

The only assumption from Proposition 12.10 that cannot be verified for this imple- 
mentation is the assumption that the algorithm fixed point is a singleton; all other 
assumptions are satisfied automatically by the problem structure. We observe, how- 
ever, starting from around iteration 1500 in Fig. 12.2b, behavior that is consistent with 
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(i.e. does not contradict) linear convergence. From this, the observed convergence 
rate isc = 0.9997, which yields an a posteriori upper estimate of the pixel-wise error 
of about 8.9062e~*, or 3 digits of accuracy at each pixel.” 


12.4 Primal-Dual Methods 


The ADMM method presented above suffers from the extreme computational cost 
of computing the prox-operator in step 1. The results of the previous section required 
several days of cpu time on a 2016-era laptop. In this section we present a method 
studied in [3] that can achieve results in about 30 s on the same computer architecture. 
In this section we survey the main results (without proofs) from [1]. There is one 
subtle difference in the present survey over [3] that has major implications for the 
application and implementation of the main Algorithm 2. 

In this section we consider exclusively functions g in problem (P) of the form 
(12.11). The algorithm we revisit is the proximal alternating predictor-corrector 
(PAPC) algorithm proposed in [37] for solving (S). It consists of a predictor-corrector 
gradient step for handling the smooth part of L in (12.13) and a proximal step for 
handling the nonsmooth part. 


Algorithm 2: Extended Proximal Alternating Predictor-Corrector (EPAPC) 
for (S). 
Parameters : Set 7 > 0 and choose the parameters 7 and ø to satisfy 
TE (0, rae 0<ro< a and 
(Yw € (ker A?)+)(Vwo € ker AT) +]lwoll? > g*(w) — g*(w + wo). 
Initialization: Choose (x°, y?) € R” x (ker AT) 
1 fork = 1,2,...do 
2 | Heart! rV fa 4+ Ay); 
3 fori=1,...,Mdo 


, 112 
a | | wf = argmax pern KAip, wi) — gfw) = (1/20) | wi - |) 


5 = proxy g (yi + oA; p*); 


— k: 
6 y= Piker AT) + ; 
xk = xk! = T(V f (xk!) + AT yk), 


At each iteration the algorithm computes one gradient and a prox-mapping cor- 
responding to the nonsmooth function, both of which are assumed to efficiently 
implementable. We suppose these can be evaluated exactly, though this does not take 
into account finite precision arithmetic. The dimension of the iterates of the EPAPC 


?This statement does not take into account finite precision evaluation of the steps in Algorithm 1. 
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algorithm is on the same order of magnitude as with the ADMM/Douglas-Rachford 
method, but the individual steps can be run in parallel and, with the exception of the 
projection in Step 6, are much less computationally intensive to execute. 

For quantitative convergence guarantees of primal-dual methods, additional 
assumptions are required. 


Assumption 3 


(i) The function f : R" — R is convex and continuously differentiable with Lip- 
schitz continuous gradient V f (constant K+), that is for all x, x' € R", we 
have 

IVEG@)-— vf@NI Klx — xl; (12.25) 


(ii) The function f : R" — Ris pointwise quadratically supportable (Definition 
12.2) at each £ in the solution set S*. 


(iii) There exists a o > Q such that 


(Vw € (ker AT)~)(Vwo € ker AT) Fllwoll? = g*(w) — g*(w + wo). 


The assumption of Lipschitz continuous gradients Assumption 3(i), is standard, but 
stronger than one might desire in general. The assumption is included mainly to 
guarantee boundedness of the iterates. Lipschitz continuity of the gradients is enough 
for our purposes, however. By the standing Assumption 1 the mapping A is injective 
and when m < n, then A has full row rank, and AA’ is invertible. When m > n, 
A is still injective but AA’ has a nontrivial kernel and care must be taken that 
the conjugate function g* does not decrease too fast in the direction of the kernel 
of A’. This is assured by Assumption 3(iii). This assumption comes into play in 
Lemma 12.1. 

Step (3) of Algorithm 2 can be written more compactly when g(w) := g(wı,..., 
wm) = yii gi(w;i). In this case, the convex conjugate of a separable sum of func- 
tions is the sum of the individual conjugates: g*(w) := Yr g; (w;). Defining the 
matrix S = o~'J,, we immediately get that for any point G € R™, i=1,...,M, 


prox ye(6) = (Pr0x p g (C1), PrOX, gs (Ca), + + PLO, (Cm): 


Thus Step (3) of Algorithm 2 can be written in vector notation by w* = Proxy 5 
(yk! + oAp*). It is possible to use different proximal step constants o;, i = 
1..., M, see details in [37]. The choice o; = o fori = 1,..., M is purely for sim- 
plicity of exposition. The projection onto (ker A’ )+ in (6) is carried out by applying 
the pseudo inverse: 


Piker AT)" =A (ATA) A’. 


When m <n and A; is full rank for all i = 1,2,..., M, then ker A? = {0} and 
the above operation is not needed. But an unavoidable feature of multi-resolution 
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analysis, our motivating application, is that m > n, so some thought must be given 
to efficient computation of (AA). 

The next technical result, which is new, establishes a crucial upper bound on the 
growth of the Lagrangian with respect to the primal variables. 


Lemma 12.1 Let Assumption 3 hold and let ((p*, yÉ, a be the sequence gen- 
erated by the EPAPC algorithm. Then for every k € N and every (x, y) € R” x R”, 


Lt, y) = Ley) s 5 (l =- e-l) 
TF 


(12.26) 
1/1 112 
-3 (2-8) 14-1. 


and 


Let, 9) — Let) 23 (Iy- - Ly - 9G -1# 9B) 
(12.27) 
where 
G := o! Ip —tTAA’. (12.28) 


Note that for the choice of 7 given in the parameter initialization of Algorithm 2, 
G >0. 


Proof The proof of (12.26) follows exactly as in the proof of [37, Lemma 3.1(i)]. 
The second inequality (12.27) follows exactly as the proof of [37, Lemma 3.1(ii)] in 
the case that ker AT = {0}. The case where ker A” is nontrivial requires more care. 
The proof of this is suppressed in the interest of brevity. 


The next intermediate result establishes pointwise quadratic supportability (Def- 
inition 12.2) on bounded sets at all saddle points under Assumptions 1 and 3. 


Proposition 12.11 ([3]) Let ((p*, yk, oa) ee be the sequence generated by the 
EPAPC algorithm. If Assumptions 1 and 3 are satisfied, then for any primal solution 
x to the saddle point problem (S), there exists a u > 0 such that 


FO) = FR HVA), x* 2) + Syl’ -&|? Yk. (12.29) 


The constant p in Proposition 12.11 depends on the choice of (x, y°) and so depends 
implicitly on the distance of the initial guess to the point in the set of saddle point 
solutions. 

Convergence of the primal-dual sequence is with respect to a weighted norm on 
the primal-dual product space built on G in (12.28). 


ge (;) A es d] (12.30) 
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where by the assumptions on the choice of r given in Algorithm 2, G > 0. We 
can then define an associated norm using the positive definite matrix H, ul}; = 
1 2 2 
el + Iylio- 

We are now ready to state the main result and corollaries, whose proofs can be 
found in [3]. 


Theorem 12.4.1 Let ((p*, x Fl) cn be the sequence generated by the EPAPC 
algorithm. Let Amin} (AAT) denote the smallest nonzero eigenvalue of AAT. If 
Assumptions I and 3 are satisfied, then there exists a saddle point solution for L(-, -), 
the pair i = (&, $), with } € (ker A’)+, and for any a > 1 and for all k > 1, the 


sequence (uk = (ef, cae) ee satisfies 
uk - all, < I al? (12.31) 
where 
-1 1l—-rTL Amin 2 Amin E 
jamn) C VOUTE) +AA )  PTONmin (AA) (12.32) 


Q , ar; + OAmint (AAT) 


is positive and u > 0 is the constant of pointwise quadratic supportability of f at x 
depending on the distance of the initial guess to the point (x, J) in the solution set 
S*. In particular, (or y*)) is Q-linearly convergent with respect to the H-norm 
to a saddle-point solution. 


keN 
An unexpected corollary of the result above is that the set of saddle points is a 
singleton. 


Corollary 12.12 (Unique saddle point) Under Assumptions I and 3 the solution set 
S* is a singleton. 


The above theorem yields the following estimate on the number of iterations 
required to achieve a specified distance to a saddle point. 


Corollary 12.13 Under Assumptions 1 and 3, letu = (x, y) be the limit point of the 
sequence generated by the EPAPC algorithm. In order obtain 


lxt- žl <e (resp. lly“ —Jlle < ©), (12.33) 


it suffices to compute k iterations, with 


210g (1%) 2log () 
k> Zu u resp. k > — aa ; (12.34) 


where C = |lu? — ülly = (2 Ix° - =|? + |» - le) and 6 is given in (12.32). 
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12.4.1 EPAPC for Statisitcal Multi-resolution Estimation 
of STED Images 


An efficient computational strategy for evaluating or at least approximating the pro- 
jection Per aryı in Step 6 of Algorithm 2 has not yet been established. We report 
here preliminary computational results of Algorithm 2 without computing Step 6. 
Our results show that the method is promising, though error bounds to the solution 
to (S) are not justified without computation of Pier Ar)L- 

In our numerical experiments, the constraint penalty in (S) takes the form g(y) = 
>>; ta (y) where each C; = {y : | ey wi(y; — b;)| < yi}. This is an exact penalty 
function, and so solutions to (S) correspond to solutions to (Po). Using Moreau’s 
identity (12.6), the prox-mapping is evaluated explicitly in (6) for each constraint by 


YÉ = prox, (yi | + oA; p") 


k—1 k 
} A; 
= yi! + oA; p ore (2 2 w), i=1,...,M. 


oO 


The proximal parameter is a function of 7 and given by o = 1/(T|| AAT ||2). More 
details in [37, Sect.4.1]. 

Here, we also consider the smooth approximation of the L'-norm as the qualita- 
tive objective. The L'-norm is non-smooth at the origin, thus in order to make the 
derivative-based methods possible we consider a smoothed approximation of this, 
known as the Huber approximation. 

The Huber loss function is defined as follows: 


ca if 
Ilha = ba), dalt) = B eee (12.35) 


E 
iy It]- 5 if|t|>a, 


where a > 0 is a small parameter defining the trade-off between quadratic regular- 
ization (for small values) and L! regularization (for larger values). The function ¢ is 
smooth with +-Lipschitz continuous derivative and its derivative is given by 


g(t) = F Se (12.36) 


sgn(t) if|t| > a. 


Pointwise quadratically supportability of this function at solutions is not unreasonable 
but still must be assumed. 

We demonstrate our reconstruction of the image inset shown in Fig. 12.1 of size 
n = 64° with the same SMRE model as the demonstration in Sect. 12.3.1. The confi- 
dence level y; was set to 0.25 x i at each resolution level (i = 1, 2, 3). Figure 12.3(top 
left) shows the reconstruction with the L? function f(x) = 0.01||x||? (compare to 
Fig. 12.2). Figure 12.3(top right) shows the reconstruction with the Huber function 
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Fig. 12.3 Results of image denoising and deconvolution via Algorithm 2 for statistical multi- 
resolution estimation with three levels at iteration 4000 with a f(x) = 0.01 x ||x?||? and b f(x) = 
Ix||1,0.25, the Huber loss function. Frame c shows the step length Iu%+! — uk ly where uk = 
(af, yk) of the associated sequence as function of the number of iterations 


|x llia, where œ = 0.25. Figure 12.3(bottom) shows the step size of the primal-dual 
pair for each of these regularized problems as a function of iteration. The model 
with quadratic regularization achieves a better average rate of convergence, but for 
both objective functions the algorithm appears to exhibit R-linear convergence (not 
Q-linear). What is not evident from these experiments is the computational effort 
required per iteration. Without computation of the pseudo-inverse in step 6, the 
EPAPC algorithm computes these results in about 30 s on a 2018-era laptop, com- 
pared to several days for the results shown in Fig. 12.2. 


12.5 Randomized Block-Coordinate Primal-Dual Methods 


The previous sections reviewed numerical strategies and structures that yield quan- 
titative estimates of the distance of an iterate to the solution of the underlying vari- 
ational problem. In this section we examine implementation strategies for dealing 
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with high dimensional problems. These are implementation strategies because they 
do not involve changing the optimization model. Instead, we select at random a 
smaller subset of variables or constraints in the computation of an update in the full- 
dimensional iterative procedure. This is the principal strategy for handling problems 
that, due to their size, must be distributed across many processing and storage units 
(see for instance [26, 38-40] and references therein). We survey here a randomized 
primal-dual technique proposed and analyzed in [2]. The main theoretical question 
to resolve with such approaches is whether, and in what sense, iterates converge to 
a solution to the original problem. We can determine whether the iterates converge, 
but obtaining an estimate of the distance to the solution remains an open problem. 

The algorithm below is a primal-dual method like the algorithms reviewed above, 
with the exception that it solves an extension of the dual problem (D): 


aae, fi@+g°0) . By 
subjectto x=-ATy 

The main prox operation is computed on the dual objective in (D), that is f*(x) + 
g*(y) with respect to the variables (x, y) € R” x R”. The dimension of the basic 
operations is unchanged from the previous approaches, but the structure of the sum of 
functions allows for efficient evaluation of the prox mapping. Implicit in this is that 
the function f is prox friendly. In the algorithm description below it is convenient 
to use the convention f = go, Ao = Id. The algorithm is based in part on [39]. 


Algorithm 3: Random Block-coordinate Primal-Dual Algorithm (RBPD) for 
(D) 


Parameters : Choose 7 > 0,0 = (00,0) ...,0m) € RM. 
Initialization: Choose y! = 0 € R”, and set x! = u! = —AT y! = 0 ER". 
1 fork = 1,2, ... do 
2 Choose i € {0, 1, 2, ..., M} uniformly at random; 
k+1 
3 Yi = prOXg; g? (yk — oj Ajx*); 
k+l _ AT (k+l ky. 
4 ò — A; i = Ji ); 
5 ukt! = uk a Zot; 
6 hth = ukt! + yk 4 get, 


Notice that each iteration of Algorithm 3 requires only two small matrix-vector 
multiplications: A;(-) and A} (-). The methods of the previous sections, in contrast, 
worked with full matrix A = [AT,..., A7,]”. This means that all iterations involve 
full vector operations. For some applications this might be not feasible, at least on 
standard desktop computers due to the size of problems. Algorithm 3 uses only 
blocks A; of A, therefore each iteration requires fewer floating point operations, 
at the cost of having less information available for choosing the next step. This 
reduction in the effectiveness of the step is compensated for through larger block- 
wise steps. Computation of the step size is particularly simple. This follows the same 
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hybrid step-length approach developed in [41] for the nonlinear problem of blind 
ptychography. In particular, we use step sizes adjusted to each block A;: 


TO;Amax(A? Aj) < 1 Vi =0,1,...,m. 


Proposition 12.14 (Theorem 1 of [2]) Suppose Assumption I holds and let 
Tai || Ai I? < 1. Then (x*, y*) generated by Algorithm 3 converges to a solution to 
(D). In particular, the sequence (x“) converges almost surely to a solution to (P). 


The statement above concerns part (i) of Theorem 1 of [2]. No smoothness is required 
of the qualitative regularization f. Instead, it is assumed that this function is prox- 
friendly. This opens up the possibility of using the 1-norm as a regularize, promoting, 
in some sense, sparsity in the image. No claim is made on the rate of convergence, 
though the numerical experiments below indicate that, for regular enough functions 
f, convergence might be locally linear. This remains to be proved. 


12.5.1 RBPD for Statisitcal Multi-resolution Estimation 
of STED Images 


Despite many open questions regarding convergence, random methods offer a way 
to handle extremely large problems. To make a comparison with the deterministic 
approaches above, cycles of the RBPD Algorithm 3 are counted in terms of epochs. 
An epoch contains the number of passes through steps 1-6 of Algorithm 3 required 
before each block is chosen at least once. After k epochs, therefore, the i-th coordinate 
of x will be updated, on average, the same number of times for the randomized 
algorithm as for the deterministic methods. In other words, an epoch for a randomized 
block-wise method is comparable to an iteration of a deterministic method. As the 
RBPD updates only one block per iteration, each iteration is less computationally 
intensive than the the deterministic counterparts. However, in our case this efficient 
iteration still requires one to evaluate two (possibly) expensive convolution products 
(embedded in A;x and AT y). Thus, if these operations are relatively expensive, the 
efficiency gain will be marginal. Nevertheless, because of the ability to operate on 
smaller blocks, the randomized method requires, per epoch, approximately half the 
time required per iteration of the deterministic methods. Although the quantitative 
convergence analysis remains open, our numerical experiments indicate that the 
method achieves a comparable step-residual to the EPAPC Algorithm 2 after the 
same number of epochs/iterations. 

As with the experiments in the previous Sections, we use three resolutions, which 
results in one block at the highest resolution, four blocks at the next resolution (four 
possible shifts of 2 x 2 pixels), and nine blocks at the third resolution (nine different 
shifts of 3 x 3 pixels). We applied Algorithm 3 with different regularization f in 
(Po): the 1-norm f(x) = ||x||ı, Huber loss function f(x) = ||x||ı,. given by (12.35) 
(a = 0.25) and the squared Euclidean norm. As with the EPAPC experiments, the 
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residual 
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(d) epochs 
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Fig. 12.4 Results of image denoising and deconvolution using RBPD (Algorithm 3) for statisti- 
cal multi-resolution estimation of a 64 x 64 pixel image with three levels at epoch 4000 with a 
f(x) = 0.01 x lx], b f(x) = ||x|l1,0.25, the Huber loss function and e f(x) = ||x||1. d The step 


length \Ju*+! — uk | where u* = (x*, y*) of the associated sequence as function of the number 
of iterations 


function g is given by (12.11) with g; given by (12.17) for the parameter q; = 0.25 x i 
fori = 1, 2, 3. All of these functions are prox-friendly and have closed-form Fenchel 
conjugates. The gain in efficiency over the deterministic EPAPC method proposed 
above (without computation of the pseudo-inverse) is a factor of 2. 

Figure 12.4a-c shows the reconstructions on the same 64 x 64 image data used 
in the previous sections. The numerical performance of the algorithm is shown in 
Fig. 12.4(d). What the more efficient randomization strategy enables is for the full 
976 x 976 pixel image to be processed. The result for regularization with the 1-norm 
is shown in Fig. 12.5. 
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Fig. 12.5 Left: original image. Right: image denoising and deconvolution using RBPD (Algorithm 
3) for statistical multi-resolution estimation with three levels at epoch 4000 with f(x) = ||x|l1 
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Chapter 13 R) 
Holographic Imaging and Tomography geai 
of Biological Cells and Tissues 


Tim Salditt and Mareike Töpperwien 


Abstract This chapter reviews recent progress in propagation-based phase-contrast 
imaging and tomography of biological matter. We include both inhouse u-CT results 
recorded in the direct-contrast regime of propagation imaging (large Fresnel numbers 
F), as well as nanoscale phase contrast in the holographic regime with synchrotron 
radiation. The current imaging capabilities starting from the cellular level all the 
way to small animal imaging are illustrated by recent examples of our group, with 
an emphasis on 3D histology. 


13.1 Propagation-Based Phase-Contrast Tomography 


For the high photon energies E of hard X-rays which are needed to penetrate bulk 
samples, absorption contrast becomes negligible and phase contrast prevails. This is 
simply the result of the energy dependence of the X-ray index of refractionn(E, r) = 
1 — (E, r) +iPß(E,r), which is well suited to describe the propagation of X-rays 
in matter as long as the continuum approximation holds, i.e., if scattering angles are 
small. One of the advantages of hard X-ray imaging is that the spatial distribution 
of the real part of the refractive index is proportional to the corresponding electron 
density p(r). The predominance of phase interaction over absorption is quantified 
by the ratio 6/8 >> 1, which is particularly large for the low-Z elements of soft 
(unmineralized) biological tissues. The resulting phase shift and amplitude decrement 
of a beam traversing a resolution element (voxel of side length a) are Ab = —kôa and 
AA = exp(—k (Ga), respectively. Even for materials where absorption contrast is still 
sufficient at large length scales, it will become impossible to distinguish structural 
details if the side length a of the voxel is decreased. Hence, high resolution imaging 
with hard X-rays always requires phase contrast. It is therefore not surprising that 
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suitable implementations of phase-contrast radiography and tomography have been 
a major research goal over the last two decades, after X-ray sources with sufficient 
partial coherence had become available. 

All phase-contrast methods rely on wave-optical transformations of the phase 
shifts, which the sample induces in the (partially) coherent wavefront, into measur- 
able intensities patterns by ways of interference, see also [1, 2] for a review. The 
particular geometries and mechanisms by which waves are brought to interfere can 
be quite different. The methods can be classified according to the order of phase 
contrast. Zero-order methods such as a Bonse-Hart interferometry [3] are capable 
to measure the absolute phase shift Ab between an object and an empty reference 
beam. First-order methods such as crystal [4] or grating (Talbot) interferometers [5, 
6] are sensitive to the first spatial derivative of the phase V& in the object plane. 
Finally, second-order methods are based on contrast formation proportional to V7¢. 
This is the case for propagation-based phase contrast, where the self-interference of 
the diffracted beam behind the object and the unattenuated or weakly attenuated pri- 
mary beam interfere to form a defocused ‘image’. As a second-order phase-contrast 
method, propagation imaging is particularly well suited for high spatial resolution. 
Further, it does not require any optical components acting as an ‘analyzer’. First- 
order techniques make use of optical components which are scanned or rotated during 
data acquisition, such as in crystal interferometers (diffraction enhanced imaging) 
[4], or in grating (Talbot) interferometers [5, 6]. A particular advantage of Talbot 
interferometry is that along with the phase information it also generates an additional 
and completely separated darkfield image [5]. Furthermore, phase sensitivity is very 
high. A related first-order phase-contrast technique uses edge illumination or coded 
apertures [7, 8]. As in Talbot interferometry, the beam is structured in the object 
plane by a periodic array creating many beamlets. In contrast to a phase grating, 
these are spatially separated and small angular changes in their directions induced 
by the object are recorded downstream by a detector with sharp absorbing edges, 
without interference between the different beamlets. Hence, this technique is also 
applicable in the case of an incoherent source. Finally, in a more recent variant, the 
Talbot grating is replaced by a random speckle producing pattern [9]. 

A major disadvantage of all first-order techniques is the fact that the resolution 
is limited, e.g., by the grating period, aperture size or speckle grain. At the same 
time, the number of images to be acquired is fairly high and can pose a serious chal- 
lenge in terms of acquisition time and dose. Note that tomography already requires 
the acquisition of hundreds or even thousands of projection images. It is therefore 
important to keep the acquisitions per projection angle at minimum and to record 
a sufficient number of resolution elements in parallel, i.e., to use detectors with a 
large number of pixels, and an optical setup where resolution and magnification are 
matched. For high-resolution tomography of biological samples, phase contrast by 
free space propagation thus remains the method of choice. 

A central challenge in propagation-based imaging has been the formulation of 
accurate and efficient phase-retrieval schemes [1, 10, 11]. This has been particu- 
larly difficult in the holographic regime (small Fresnel number F), where the wave 
diffracted from any point in the object reaches all or a large set of detector pixels. 
However, high geometric magnification results in small F and hence one always 
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ends up in the holographic regime, since the effective pixel size in the object plane 
enters quadratically [12]. At the same time, this holographic regime offers highest 
sensitivity to small phase shifts. As we review in this chapter, quantitative phase 
retrieval can be accomplished with a set of four measurements per projection. The 
key concept as introduced by Peter Cloetens and coworkers is to record images at four 
different F; [10], e.g, by varying the sample/detector position or the photon energy. 
In the meantime, the initial limitation to objects with weakly varying phase can be 
overcome by iterative algorithms [13, 14]. In special cases, additional constraints 
such as compact support [15], sparsity [16], or range and dimensionality constraints 
[17] can reduce this set to a single acquisition. 

It is also instructive to briefly compare propagation-based phase contrast to coher- 
ent diffractive imaging (CDI), which is typically carried out in the optical far field. 
Strictly speaking, far-field diffraction can also be counted as a ‘phase-contrast’ tech- 
nique, since it is dominated by the Fourier transform of ö(r), but the term ‘phase con- 
trast’ is mostly used only in full-field radiography, and not in diffractive imaging. How 
then, do CDI and propagation imaging compare, in particular if both record diffrac- 
tion patterns at small F? Importantly, holographic imaging exploits the interference 
terms 2 - Re[z,~)] between a scattered wave Ys and a reference wave wo (enlarged 
primary wave), while CDI uses the far-field diffraction pattern Y, Y¥ [18-23], without 
additional mixing with a reference wave. This important difference changes the way 
in which phase information is encoded in the intensity images, including the mathe- 
matical nature of the phase problem [24]. Furthermore, in near-field imaging, a weak 
scattered signal can be amplified high above background signals of residual scatter. 
These differences could in principle also affect the dose-resolution relationship, as 
we will further discuss later on. In this chapter, we show how propagation-based 
phase-contrast imaging has now matured to a powerful tool for 3D imaging of bio- 
logical matter. We both include inhouse u-CT results, where partial coherence has 
become sufficient to observe edge enhancement in the direct-contrast regime of prop- 
agation imaging (large Fresnel numbers F), and high-resolution phase contrast in 
the holographic regime by synchrotron radiation. Recent studies on biological cells 
and tissues performed by our group serve as examples for the current capabilities of 
phase-contrast imaging and tomography. 


13.2 Nano-CT Using Synchrotron Radiation: Optics, 
Instrumentation and Phase Retrieval 


13.2.1 Cone-Beam Holography 


X-ray propagation imaging with nanoscale resolution requires a correspondingly 
small focus of the X-ray beam, which is today only possible using synchrotron radi- 
ation. The divergent cone-beam behind the focus is then exploited for illumination 
and recordings in the holographic regime. We therefore use the term ‘holography’ 
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or ‘holographic X-ray imaging’ synonymously for this type of high-resolution prop- 
agation imaging. The properties of the beam, which is also denoted as the probe in 
the field of coherent X-ray imaging, are hence essential for holographic imaging. 
High resolution and quantitative phase contrast can only be achieved by efficient 
nano-focusing optics, sufficient coherence and smooth wavefronts [25-28]. These 
properties are certainly beneficial for all coherent X-ray techniques, however, to a 
different degree. Coherent diffractive imaging (CDI) in the far field, for example, is 
less sensitive to wavefront errors, but requires higher coherence [29]. 

In holography, aberration-free image formation relies on the quasi-spherical 
nature of wavefronts, since in data treatment one tacitly assumes spherical wave- 
fronts in order to apply the Fresnel scaling theorem [2]. A small focus in a coherent 
probe also warrants a large numerical aperture in the illuminating wavefront. This 
directly affects the spatial resolution and also facilitates high geometric magnifi- 
cations M, and hence small effective pixel sizes, in the case of a limited detector 
distance. Typical values for the focus-to-sample distance in high-resolution hologra- 
phy range between a few millimeters and several centimeters. Unfortunately, X-ray 
nano-focusing is associated with significant wavefront distortions, see also [30] for 
an overview of different X-ray focusing optics. These wavefront artifacts violate the 
idealizing assumptions made on the probe in the course of image reconstruction, such 
as point-source emission or distortion-free wavefront. The validity of these assump- 
tions has recently been investigated, showing that they lead to reduced resolution 
and image quality [31, 32]. To avoid this, additional optical filtering and wavefront 
cleaning can be used. Alternatively, phase-retrieval schemes have to be generalized 
to non-ideal illumination conditions. In short, either hardware or software solutions, 
or a combination of both, are required. 

For the latter case, the ptychographic concept of simultaneous probe and object 
reconstruction was recently generalized to near-field (propagation) imaging 
[26, 33-36]. Ptychographic algorithms were initially formulated only for confined 
probes (but extended objects), a setting which is typical for far-field coherent diffrac- 
tive imaging [23, 37-41]. A generalization to extendeded illumination wavefronts 
was given in [36, 42], but a wavefront diffuser was required in order to increase 
the diversity of the probe. Thus, only the artificially modified wavefront and not the 
‘natural’ probe could be recovered. This limitation was lifted by introducing lon- 
gitudinal scanning of the object in [34], as well as the combination of lateral and 
longitudinal scans [33] to generate diversity in the data. Reconstructions of the illu- 
mination produced by a set of Kirkpatrick-Baez (KB) mirrors in the imaging plane 
were presented in [35] for the upgraded beamline ID 16a of the European Synchrotron 
Radiation Facility (ESRF) in Grenoble [43], and in [26] for the holography endsta- 
tion GINIX (Göttingen Instrument for Nano-Imaging with X-rays, cf. Fig. 13. 1a) [44] 
at the P10 beamline of PETRAIII (DESY, Hamburg). As an alternative to shifting 
the sample, probe reconstruction was also achieved without any object in the beam 
by translating the detector [26], using an improved multiple magnitude projections 
(MMP) scheme [31, 45, 46]. The disadvantage of these approaches is that multiple 
images with different sample translations have to be recorded for each tomographic 
projection. Further, they only work if the empty beam remains temporally stable. 
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Fig. 13.1 Experimental realization of holographic imaging at the GINIX setup installed at the 
P10 beamline at PETRAIII (DESY, Hamburg). a The X-rays are generated in an undulator and 
monochromatized by a double-crystal Si(111) channel-cut monochromator. Subsequently, they are 
focused by a set of Kirkpatrick-Baez (KB) mirrors. An X-ray waveguide is placed in the KB focal 
plane as a coherence filter and to increase the numerical aperture for holographic imaging. The 
sample is mounted on a fully motorized sample tower at a distance zoı behind the waveguide 
and the evolving intensity distributions are recorded in the detection plane at distance zı2 behind 
the sample. b A waveguide can be either realized by combining two 1d devices, consisting of a 
multilayer structure with carbon as guiding layer, or by etching a channel into a silicon wafer and 
bonding a second wafer on top, leading to a closed channel with air as guiding layer. c By introducing 
these waveguides into the setup, the disturbed illumination (left), caused by small sub-nanometer 
irregularities on the surface of the KB-mirrors, is spatially filtered, resulting in a smooth illumination 
in which high-frequency variations are suppressed (right). Scale bars: 0.5 mrad. Adapted from [47] 


13.2.2 Waveguide Optics and Imaging 


A hardware solution to avoid artifacts due to a non-ideal illumination is given by 
additional optical elements for coherence and wavefront filtering. This can be accom- 
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Fig. 13.2 Waveguide holographic imaging at the single cellular level. a Normalized hologram of 
Deinococcus radiodurans cells (freeze-dried), obtained in a single recording with 8 s dwell time 
along with b the iterative mHIO phase reconstruction. c mHIO reconstruction of (initially) living 
cells in solution. Each frame was recorded during 80 s (every other frame is shown). Gradual 
changes in the densities (see arrow) are observed in response to successive irradiation. Scale bars: 
4 um. Adapted from [25] 


plished by X-ray waveguides [48-52], positioned in the focal plane of the focusing 
optics. Owing to the fact that the X-ray focus is smaller at the exit of the waveg- 
uide than in the front, this also offers the important advantage of increased numerical 
aperture. The significant challenges in fabricating two-dimensionally confining X-ray 
waveguides of suitable quality were solved by crossing two planar one-dimensional 
waveguides [53, 54], or by advanced lithographic techniques and wafer bonding 
[55, 56], cf. Fig. 13.1b, including advanced schemes with tapered [57] or curved 
waveguides [58]. As shown in [12, 14, 15, 25], X-ray waveguides provide highly 
coherent, well controlled, smooth and quasi point-like illumination for nanoscale X- 
ray imaging. A comparison between the illumination in the imaging plane provided 
by a waveguide and a set of KB mirrors is depicted in Fig. 13.1c. The 2D imag- 
ing capabilities of this filtered wavefront for biological tissues are demonstrated in 
Fig. 13.2 at the example of freeze-dried Deinococcus radiodurans cells. The first 
tomography application using waveguides was demonstrated for bacterial cells in 
[59]. Significantly increased 3D image quality has been achieved in the meantime 
due to various improvements on different levels, starting from the waveguide optics, 
the alignment and image processing procedure, the recording and detection scheme, 
and finally the phase retrieval [14]. The current state-of-the art for tomography of 
biological cells is reported in [14] and for biological tissues in [60, 61]. 
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13.2.3 Dose-Resolution Relationship 


The resolution values as obtained for waveguide-based holography have reached the 
range of 50 nm (half-period resolution). Fitting of line cuts through lithographic test 
patterns has given FWHM values down to 25 nm, but based on limited numerical 
aperture twice this value seems a more realistic resolution estimate. This is still sig- 
nificantly lower than typical values for ptychography, e.g. 10-15 nm for the same 
test pattern imaged in [62, 63]. Note that resolution determination of near-field holo- 
graphic imaging deserves a careful consideration [64], and shows particularities not 
known from far-field imaging, such as a dependence of the maximum theoretical 
resolution on the object position in the field of view. The analysis in [64] also indi- 
cates the importance of the numerical aperture in holographic imaging. From an 
experimental point of view, benchmark experiments on realistic biological samples 
are required, beyond the typical demonstrations for specially designed test charts, 
which are highly contrasted and for which high resolution can be much more easily 
achieved. 

For this reason, ptychographic and holographic reconstructions have been com- 
pared for the same objects, namely Deinococcus radiodurans bacterial cells. This 
bacterium had served early on as a first demonstration that ptychographic imaging 
with hard X-rays is possible for low contrast biological samples [66]. 3D reconstruc- 
tions by ptycho-tomography were presented in [62], holographic imaging in [25], 
and an early holo-tomography result in [59]. In these studies the point was made 
that X-ray imaging yields quantitative electron density contrast. Therefore, the long 
debated (mass) density in bacterial nucleoids could be addressed in quantitative terms 
[25, 62, 66, 67]. 

These studies also provided a starting point for considerations of the dose- 
resolution relationship. Surprisingly, they pointed to an advantage of holography in 
terms of dose with respect to (far-field) ptychography. Since experiments are never 
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Fig. 13.3 Dose-resolution relationship in holographic and coherent diffractive imaging. a Example 
reconstructions for 200 photons per pixel after 200 iterations of RAAR (8o = 0.99, Bmax = 0.75 
and (3, = 150 iterations) for near-field holography (NFH) and coherent diffractive imaging (CDI) 
using a support and pure phase object constraint. b Fourier ring correlation of the reconstructions. 
The intersection with the 1/2-bit threshold curve determines the maximum frequency that can be 
resolved. ce Resolution as a function of dose. Scale bar: 50 pixels. Adapted from [65] 


346 T. Salditt and M. Töpperwien 


completely free of uncontrolled parameters, this issue was studied also by analytical 
and numerical work. Starting with random binary bitmap patterns, the informa- 
tion content of near-field and far-field diffraction patterns was compared, as well as 
reconstructibility as a function of fluence (see [68] for a precise definition of ‘recon- 
structibility’), based on a maximum likelihood approach. Earlier work had already 
used random bitmap pattern and mutual information theory for a one-dimensional 
model [69]. More realistic phantoms and reconstruction algorithms were used in 
[65] for numerical simulations of the dose-resolution relationship for near- and far- 
field coherent diffractive imaging (cf. Fig. 13.3). In this study, a dose advantage for 
near field phase retrieval over CDI was found. This conclusion can, however, only 
me made for the particular reconstruction algorithms which have been used. Other 
authors who compared holographic and ptychographic reconstruction did not find any 
considerable differences [70]. Numerical simulations have also considered the effect 
of finite partial coherence, multi-modal wavefield reconstructions [71], as well as 
the coherence-resolution relationship [29]. Analytical scaling of the dose-resolution 
curves as well as experimental results on the resolution limits of biological objects 
due to radiation damage are given in [72, 73]. 


13.2.4 Phase Retrieval Algorithms 


In many cases, phase-contrast experiments are carried out in the direct-contrast 
regime where phase-contrast effects are visible as edge enhancement and phase 
reconstruction can be performed by linearization of the transport of intensity equa- 
tion (TIE) along the propagation direction z [74, 75]. In the holographic regime, 
however, where the phase contrast transfer and hence phase sensitivity is highest, 
these reconstruction algorithms fail. An approach for the inversion of phase-contrast 
effects in this regime based on the contrast transfer function (CTF) was proposed 
20 years ago by Cloetens and coworkers [10]. It relies on several different measure- 
ment planes with varying Fresnel numbers and it is valid for weakly absorbing or 
pure-phase objects with a slowly varying phase. In the next section, this so-called 
CTF approach and its limits will be discussed in detail, and a particular iterative 
approach will be presented which is well suited to replace CTF with a larger range of 
applicability [13]. Before we do so, however, we give a broader overview over recent 
work on iterative algorithms, including those which are designed for single distance 
acquisition or are based on different constraint sets (see also Chap. 6). To this end, 
we already assume a general understanding of how iterative projection algorithms 
work, see for example the tutorial chapter on basic X-ray propagation and imaging. 

We first want to mention the so-called Holo-TIE algorithm [12], which is a one 
step direct reconstruction scheme operating on two images recorded at slightly dif- 
ferent (defocus) distances z from the source. By Holo-TIE, the TIE approach [75, 76] 
is extended towards arbitrary defocus distances z, including the holographic regime 
addressed here. This is important since TIE phase retrieval enjoys much success in 
the direct-contrast regime, and can thus be seamlessly extended, given at least two 
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recordings. Importantly, Holo-TIE does not rely on assumptions on material compo- 
sition nor on linearization of the specimen’s optical constants, as it is typically the 
case in conventional non-iterative reconstruction schemes. An extension of the ini- 
tial Holo-TIE reconstruction was presented in [14], operating on four measurements 
recorded at different, well-chosen distances (or Fresnel numbers). In order to further 
improve reconstruction quality, if necessary, Holo-TIE reconstructions can also be 
used to initialize iterative algorithms. 

Next, we consider iterative algorithms for single-distance acquisitions. For this 
case, the so-called modified hybrid-input-output (mHIO) algorithm [15] was pro- 
posed as an iterative reconstruction for (single distance) X-ray holograms. The des- 
ignation ‘modified’ refers to the fact that the HIO was well established in CDI, and 
was modified to the near-field case. The mHIO uses a support estimate from the 
holographic reconstruction to slowly push the phase outside the support to zero. 
Importantly, it was shown that this algorithm can fill in the lost information due to 
the zero crossings of the oscillatory CTF. Hence, samples of arbitrary composition 
can be phased, overcoming the common assumption that the ö(r) and 8(r) compo- 
nents of the complex index of refraction n(r) = 1 — 6(r) + i G(r) are coupled, which 
strictly is true only for samples consisting of a single material. Support estimation 
in mHIO was demonstrated in a fully automated manner for tomography in [59]. 
In [77], the scheme of mHIO was compared to RAAR, again using the same single 
distance holographic data and constraints. Significant improvements were provided 
by an iteratively regularized Gauss-Newton (IRGN) method, reaching higher resolu- 
tion and image quality than mHIO for noisy data [78, 79]. In [14], different iterative 
phase-retrieval techniques were compared for the holographic regime, using both 
numerical simulation and experimental data of biological cells. 

Finally, ptychographic algorithms should be mentioned, which offer a solution 
for cases where the separation between object and probe is challenging (for example 
if intensity minima occur in the illumination wavefront) or if the constraint set is 
insufficient. This can be the case, e.g., if no support constraint is available as the 
object covers the entire field of view and further object constraints, such as sparsity, 
cannot be applied either. In this case, additional data is required, for example by 
translating the object with respect to the probe. For such a scan series, ptychographic 
algorithms can exploit the constraint of separability, which typically offers high 
reconstruction quality for object and probe. This, however, comes at the prize of 
a significantly increased number of acquisitions, by a factor on the order of 10 or 
more. This is impractical for tomography, in particular with large fields of view, as 
these also lead to a high number of projection angles in order to fulfill the sampling 
criterion. 


13.3 CTF-based Reconstruction and Its Limits 


While the assumption of a weakly varying phase in CTF-based phase retrieval is a 
reasonable approximation for unstained biological tissues [47], samples with a larger 
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Fig. 13.4 Procedure for the alignment of projection images in multi-distance cone-beam phase- 
contrast imaging using a butterfly as test object. a By acquiring images at varying source-to-object 
distance zo1, while the source-to-detector distance zo2 is kept constant, different Fresnel numbers 
can be reached in the resulting projections. The simultaneously changing magnification, however, 
also alters the effective pixel size and field of view of the single images. b To account for a varying 
magnification, all images are scaled to the effective pixel size of the projection with the highest 
magnification. Subsequently, the images are aligned to each other in Fourier space to identify 
overlapping regions. By subsequently cutting all projections accordingly, projection images with 
the same field of view and effective pixel size but varying Fresnel numbers can be obtained, well 
suited for the application of multi-distance phase-retrieval approaches 


variance in electron density lead to artifacts in the reconstructed phase distributions. 
An alternative phase-retrieval approach is given by iterative algorithms, in which the 
phase distribution is reconstructed by alternately propagating between the object and 
measurement plane and applying according constraints as, e.g., a compact support 
[80]. As many samples are, however, not compactly supported, no suitable phase- 
retrieval algorithms exist. 

In [13], a simple approach of iterative alternating projections is introduced which 
can fill this gap, providing superior image quality for extended samples which do not 
obey the assumption of a slowly varying phase. The input data corresponds to the 
same set of measurements which is typically used in CTF-based phase retrieval with 
images acquired in N = 4 measurement planes. Due to the cone-beam geometry of 
the setup, these projection images are recorded at varying magnification and hence 
fields of view and have to be aligned to each other prior to the phase-retrieval step. This 
can be implemented according to the scheme presented in Fig. 13.4. In a first step, 
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the different projections are scaled to the effective pixel size of the projection which 
was recorded at highest magnification. Subsequently, matching fields of view are 
determined via a cross-correlation of the corresponding projection and its predecessor 
in Fourier space [81]. As the variation of the source-to-sample distance also results in 
a variation of the geometric magnification, images are acquired at different Fresnel 
numbers and hence, the occurring interference fringes vary in all projections. Since 
this can affect the quality of image alignment, it can be of advantage to use single- 
step CTF reconstructions instead of the raw projections for alignment. In a last step, 
all images are cropped to the same field of view, leading to projection images with 
the same effective pixel size but varying Fresnel numbers, which are well suited for 
the application of multi-distance phase-retrieval approaches. 

In Fig. 13.5, the results of the CTF approach for homogeneous objects [82] as well 
as the iterative phase retrieval are shown on a 2D projection of polystyrene spheres 
with a diameter of 15 um and a 3D reconstruction of an epon-embedded Golgi-Cox 
stained brain slice of a wild type mouse hippocampus [60]. In both examples, the CTF 
approach results in severe artifacts due to the violation of the underlying assumptions 
whereas the iterative approach leads to superior quality in the reconstructed phase 
distribution, especially in the 2D case of the polystyrene spheres. 

So far, the computation time of iterative algorithms impeded the application of 
iterative algorithms on large data sets. However, with the advent of new computa- 
tional hardware, especially GPUs, this limitation no longer applies, making iterative 
reconstructions feasible for a large variety of measurements in which the assumptions 
of the CTF approach are violated. 


13.4 Laboratory u-CT: Instrumentation and Phase 
Retrieval 


Phase-contrast imaging based on free-space propagation between object and detector 
requires a high degree of spatial coherence. Therefore, it was long considered to be an 
imaging technique which is only applicable at large scale synchrotron facilities. With 
the development of microfocus X-ray sources, however, the degree of spatial coher- 
ence could be considerably increased, as the lateral coherence length Lı = Azo) /s 
is proportional to the source-to-sample distance zoı, and inversely proportional to 
the source size s. In order to observe phase-contrast effects for object features of 
spatial length scale d, we must have L1 > d in the source plane. The flux density 
on the sample J « s/ Zoi is directly proportional to the source size, since the power 
loading of the target is proportional to the linear dimension s of the source (and not 
its area!). Hence, intensity is maximized if the coherence condition is just fulfilled, 
i.e., if the object is moved to the minimum distance which still fulfills the coherence 
constraint zo; = sd/A. Inserting this distance in the flux density expression gives 
I « A?/(sd?), showing that the source size ought to be reduced in order to maxi- 
mize the lateral coherence (even if this reduces the power) and that the wavelength 
should be increased to the maximum value compatible with object transmission. 
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Fig. 13.5 Comparison between CTF-based and iterative phase retrieval in 2D and 3D. a, b Phase 
distribution of a layer of polystyrene spheres with a diameter of 15 nm obtained by the CTF 
approach. The region marked by the red rectangle in (a) is shown at higher magnification in (b). 
c, d Iteratively obtained reconstruction of the same data set. The magnified part in (d) shows the 
unwrapped phase obtained by Matlab’s unwrap function. e, f Virtual slice through the density of 
a Golgi-Cox stained mouse hippocampus obtained from projections reconstructed according to the 
CTF (e) as well as iterative approach (f). The insets show regions marked by rectangles at higher 
magnification. Scale bars: 100 um (a, c, d, f) and 15 um (b, e). Adapted from [13] 


The coherence length, however, is not the only factor which has to be taken into 
account. In [83] it was shown that diffraction effects behind the object have to be 
constrained to sufficiently small angles, such that the coherent wave scattered by 
an object feature of size d does not scatter ‘out’ of the coherent radiation cone. 
This sets a limit for the scattering angle a, or equivalently the ‘shearing length’ 
Lshear = Z12@. This finally results in the condition Lghear/L 1. = (M — 1)s/(Md) < 1 
(with the geometrical magnification M = zo2/zo1). Along with Z /d > 1 this must 
be fulfilled in order for the associated phase-contrast effects to become measurable. 
Further, visibility increases towards smaller values of Lshear/L 1 and larger L, /d. In 
Fig. 13.6a, the ratio Lshear/L_ is plotted as a function of the geometrical magnification 
M and the object feature size d for a constant source size FWHM,,.. = 10 um. For 
feature sizes below 10 um, the lateral coherence length is thus insufficient, even if 
the object was illuminated coherently, since the diffraction angle would be too large, 
and the signal would not interfere coherently with the primary wave in the detection 
plane. Only when using the ‘inverse’ geometry at low M, the shearing condition 
Lsnear/Lı < 1 can be met for feature sizes smaller than 10 um. 

While a small source size s warrants sufficient partial coherence for the realization 
of propagation-based phase-contrast, it also compromises the flux, as the power 
loading of the anode cannot be increased without melting of the target material. To 
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Fig. 13.6 Considerations of partial spatial coherence and resolution in a microfocus setup. a Effect 
of coherence on phase contrast. The ratio Lshear/L1, as defined by [83], is an indicator for the 
visibility of phase-contrast effects, with enhanced visibility for smaller values of Lghear/L 1. For 
Lshear/L > 1, the partial coherence is not sufficient for phase-contrast effects to be measurable. 
The ratio depends on the source size (here: 10 um), the geometrical magnification M as well as 
the size of features of interest. Note that other important factors as the detector resolution are not 
considered. b Effective propagation distance as a function of the inverse magnification 1/M. For 
a given source-to-detector distance zo2, the maximum effective propagation distance is Zeff,max = 
0.25 - zo2, reached at a geometrical magnification of M = 2. While it decreases symmetrically for 
larger and smaller values of the inverse magnification, the Fresnel number increases monotonically 
according to FerrAzo2/ p?=1/(M—1).¢ System resolution given by (13.1) as a function of detector 
standard deviation and geometrical magnification for different typical source sizes 


overcome this limitation, more elaborate anode schemes are required, such as rotating 
anodes (cf. Fig. 13.7a) [84] or liquid-metal jets consisting of the alloy Galinstan which 
is liquid at room temperature (cf. Fig. 13.7c) [85]. In the first case, a higher photon 
flux is enabled by the decrease in stationary heat development as the interaction point 
between electron focus and anode material is constantly changing. The minimum spot 
size, however, lies in the range of ~70 um. In the second case, the advantage of the 
heat transfer is combined with an anode that is already in a liquid state, so that the 
electron power is no longer limited by the melting of the anode material, and which is 
continuously regenerated, allowing for electron power densities that can vaporize the 
anode material. Additionally, electron spot sizes well below 10 um can be reached. 

The divergent beam emanating from the quasi-point sources and the associated 
geometrical magnification M result in an effective pixel size per = p/M in the object 
plane, as well as an effective propagation distance Ze = (zoa — Z01)/M and effec- 
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Fig. 13.7 Sketch of the laboratory setups. a The X-rays are generated by a microfocus source with 
a rotating copper anode. In a distance zo; behind the source, the sample is positioned on a fully 
motorized sample stage and the intensity distributions are recorded in a distance z12 = z02 — 201 
behind the sample. Due to the comparably large source diameter, this setup is only operated in 
the inverse geometry. b In this geometry, the 700nm lines and spaces can be resolved in both 
directions, as revealed in the profiles averaged over the indicated lines shown on the right. Note that 
in horizontal direction also the 600 nm lines and spaces can be recognized (not shown) due to the 
smaller source spot in this direction. ¢ In a different setup, the X-rays are generated by a liquid-metal 
jet source with Galinstan as anode material. Depending on the desired resolution and field of view, 
the setup can be either operated in a cone-beam (zo1 X zı2) or inverse geometry (zoı > zı2). € In 
the inverse geometry, the 600 nm lines and spaces of a test pattern can be resolved in 2D 


tive Fresnel number Fog = Boal’ (ZeffA), based on the Fresnel scaling theorem [2]. 
Interestingly, the maximum effective propagation distance is given by 0.25 - zo2, 
reached at a magnification of M = 2, while for larger or smaller values of the mag- 
nification, the effective propagation distance decreases symmetrically, see Fig. 13.6b. 
As the effective Fresnel number, on the other hand, also depends on the effective pixel 
size, it deviates from the symmetric behavior of the effective propagation distance, 
and hence, can be freely adjusted by changing the geometric magnification. 
Laboratory setups are often implemented in geometries which correspond to either 
of two limiting cases: (cf. Fig. 13.7c). In the ‘cone-beam geometry’, the source- 
to-sample distance zo; is small compared to the sample-to-detector distance zo2, 
leading to a large magnification M > 1, while in the ‘inverse geometry’, the the 


13 Holographic Imaging and Tomography of Biological Cells and Tissues 353 


sample is moved close to the detector, resulting in a small magnification M ~ 1. The 
resolution of the imaging system as a function of the source standard deviation Ogrc, 
corresponding to a Gaussian source distribution, and detector standard deviation get 
is given by [86] 


sys = m — 1)?M?o2 


2 +M ?o,: (13.1) 
Hence, in the cone-beam geometry, the resolution is limited by the source size, 
while in the inverse geometry, the resolution of the detector is the limiting factor. 
In Fig. 13.6c the system resolution as a function of geometrical magnification and 
detector standard deviation is shown for typical source sizes of a liquid-metal jet 
source (FWHM,;r = 4 um and FWHM,.. = 10 um) and a source with a rotating 
anode (FWHM,,, =70 um). It is evident that for the smaller source sizes, resolutions 
well below 5 um can be reached in both geometries, while for the larger source size, 
only the inverse geometry provides resolutions sufficient to resolve features smaller 
than 10 um. Provided that a detector with a point-spread-function (PSF) of standard 
deviation in the range of ~ 1 um is available, the highest resolution for both types of 
X-ray sources can be reached in inverse geometry, in the same order of magnitude 
as the detector resolution. This could be experimentally validated by imaging an 
absorbing test pattern with the XSight Micron (Rigaku, Czech Republic) [47, 87], 
a lens-coupled high-resolution detector, showing that half-period resolutions well 
below 1 um are possible (cf. Fig. 13.7b, d). Note that the constraints of shearing length 
and system resolution result in a similar expression for the maximum magnification. 

In order to allow for tomographic imaging at the laboratory, the setup should 
comprise a fully motorized sample tower, containing one rotational axis for the 
tomographic scans, three translations above the rotation axis for the alignment of the 
sample in the field of view and one translation perpendicular to the optical axis for 
alignment of the rotation axis. Additionally, a further translation along the optical 
axis can be used for varying the source-to-sample distance zo;. By also enabling 
the motion of the detector perpendicular to the optical axis, all relevant degrees of 
freedom for a proper alignment of the rotation axis are given [47, 90, 91]. 

Data analysis starts by phase retrieval on the individual projections, followed 
by tomographic reconstruction of the 3D volume in which the cone-beam geome- 
try of the setup has to be taken into account, e.g., by using the implementation of 
the algorithm by Feldkamp, Davis and Kress (FDK) [92] provided by the ASTRA 
tomography toolbox [93, 94]. As shown in [89], a suitable phase-reconstruction 
strategy follows the Bronnikov-aided correction (BAC) [88], since this approach is 
robust with respect to the non-ideal beam conditions at compact X-ray sources such 
as low spatial coherence and large bandpass, providing sharp and quantitative recon- 
structions of the sample’s 3D density distribution (cf. Fig. 13.8). More examples of 
the results that can be obtained at the laboratory, both in cone-beam and inverse 
geometry, can be found in Sect. 13.6. 
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13.5 Novel Tomography Approaches 


13.5.1 Combined Phase Retrieval and Tomographic 
Reconstruction 


In the classical phase-contrast tomography approach, data reconstruction is per- 
formed in a sequential manner, i.e., it starts with phase retrieval on the individual 
projections, followed by tomographic reconstruction. The main challenge in this 
reconstruction scheme is the phase-retrieval step and many methods have been devel- 
oped relying on, e.g., the linearization of the transport of intensity equation [74, 75] 
or an analytic form of the free-space contrast transfer functions [10, 82]. However, in 
most cases, phase-retrieval techniques require additional assumptions like negligible 
absorption, slowly varying phase, or known compact support as well as measure- 
ments from several measurement planes. In [17], a combination of phase retrieval 
and tomographic reconstruction was introduced, called ‘iterative reprojection phase 
retrieval (IRP)’, which can overcome these limitations, providing reconstructions of 


Fig. 13.8 Comparison of different phase-retrieval approaches. a Results from the raw projections. 
Top: Virtual slice through the reconstructed pedipalp of an iodine stained cobweb spider. Bottom: 
Volume rendering of its thorax, showing individual muscle strands that appear hollow due to the 
edge-enhancement effects caused by free-space propagation between the object and the detector. 
After the application of the b MBA [74], c SMO [75] and d BAC approach [88], quantitative gray 
values can be reconstructed at high signal-to-noise ratio, though at the cost of resolution in the case 
of the MBA and SMO. Only the BAC provides a reconstruction with a resolution comparable to 
the raw data. Scale bars: 100 um. Adapted from [89] 
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Fig. 13.9 Combined phase retrieval and tomographic reconstruction. a Sketch of the combined 
iterative reconstruction algorithm called ‘iterative reprojection phase retrieval (IRP)’ [17]. P denotes 
the tomographic projection, whereas D stands for Fresnel propagation. b Virtual slice through the 
reconstruction obtained by the IRP scheme. ¢ Virtual slice through the reconstruction obtained 
after application of the CTF approach for phase retrieval and subsequent filtered backprojection 
(FBP). Compared to the IRP approach, a lower signal-to-noise ratio can be observed and twin image 
artifacts occur due to the imperfect phase retrieval. Scale bars: 5 um. Adapted from [14, 17] 


the object from projections acquired at a single reconstruction plane without fur- 
ther assumptions on the phase-shifting and absorption properties of the object or 
its support. It relies on the Helgason-Ludwig consistency which states that tomo- 
graphic projections are not independent from each other as the finite size of the 
object imposes systematic correlations. This helps to phase in particular low spatial 
frequencies, which pose a significant challenge in single-distance phase retrieval. 
The general scheme of the IRP algorithm is shown in Fig. 13.9a. 

Reconstruction starts with an initial guess for the 3D distribution of the refrac- 
tive index decrement ö(r) as well as for G(r). By forward projection, a first 
iteration of the phase and amplitude distribution of the exit wave a(x, y) « 
exp (—k J. [iĝa (r) + Bar)]dz), i.e., the wave field directly behind the object, is 
obtained for each tomographic angle a. Subsequently, the exit wave for each angle 
is propagated to the detection plane (Fresnel propagator D,) and the magnitude con- 
straint is enforced. Back propagation then yields a modified exit wave ®/,. In the last 
step, the 3D distributions of ö(r) and G(r) are reconstructed from these modified exit 
waves in a similar fashion as in the algebraic reconstruction technique (ART). By 
also enforcing positivity of the electron density, corresponding to ö(r) > 0, as well 
as positivity for G(r) (no generation of X-rays), additional unrestrictive constraints 
can be implemented for the object. This basic sequence of propagators and projectors 
is iterated M times, leading to a consistent tomographic reconstruction. The result 
of the IRP algorithm for the example of a barium-stained macrophage is shown in 
Fig. 13.9b [14]. A higher signal-to-noise ratio can be reached and twin image artifacts 
are reduced, compared to the standard sequential scheme (with CTF-based recon- 
struction of all projections followed by a filtered backprojection), which is depicted 
in (c). 
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13.5.2 Tomographic Reconstruction Based on the 3D Radon 
Transform (3DRT) 


As discussed in Sect. 13.4, a significant challenge in laboratory-based phase-contrast 
tomography is to reach sufficient brilliance with a laboratory source. To this end, small 
source sizes are required, which often means insufficient photon flux. In [95], a novel 
tomographic reconstruction approach based on the 3D Radon transform (3DRT) 
has been introduced, instead of the 2D Radon transform (2DRT) used in classical 
tomography. The 3DRT could help to solve this intensity/coherence dilemma, as it 
allows for relaxations of the source size in one of the two source dimensions, while 
exploiting the smaller dimension for resolution and coherence. 

To this end, it was shown that by proper extension of the data recording scheme, 
in particular rotation around two instead of one tomographic axes, an experimental 
realization of the area integrals required for 3DRT becomes possible. Within this 
scheme, the recorded projections are integrated along the ‘low-resolution direction’ 
in which the source spot is elongated. At the same time, the resolution and contrast 
of the entire 3D object reconstruction are determined by the perpendicular ‘high- 
resolution direction’. The 3DRT filtered backprojection is performed analogously to 
the 2D case by filtering the 1D absorption profiles and subsequent backprojection or 
‘smearing’ into the 3D space. Note, however, that the filter function in Fourier space 
is given by k? as opposed to |k| in the 2D case. Figure 13.10 presents an example 
of a 3DRT reconstruction applied to experimental data of a gerbil cochlea, which 
was recorded with anisotropic source conditions [96]. In the recorded projection 
in (a), the blurring in horizontal direction is clearly visible. After reconstruction 
with the 3DRT, the numerical reprojection under the same angle in (b) as well as 
the virtual slice in (c) show an isotropically sharp representation of the cochlea. 
Note that the sampling scheme is chosen such that the full 3D Fourier space is 
equidistantly sampled. However, in the practical implementation, the computation 


Fig. 13.10 Tomography with extended sources based on the 3D Radon transform. a Empty-beam 
corrected projection of a gerbil cochlea prior to tomographic reconstruction. The bar at the top 
indicates the effective width of the source. b Numerical reprojection of the reconstructed volume, 
showing a sharp image of the projected cochlea. c Slice through the reconstructed volume using 
the 3D Radon transform. Scale bars: 1 mm. Adapted from [96] 
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of additional 1D profiles from each of the anisotropically blurred projections in a 
sector of +40 around the high resolution direction yielded better results. As this 
leads to a nonuniform sampling of the unit sphere, partitioning of the hemisphere 
into Voronoi regions was used for normalization in the 3DRT reconstruction step. 

The 3DRT can also be used for phase-contrast imaging based on, e.g., grating 
interferometry, edge illumination or free space propagation. This was demonstrated 
in [95] for the case of propagation imaging, taking the example of a common match. 
Phase retrieval according to the BAC algorithm [88] was performed prior to the 
tomographic reconstruction. This phase retrieval step was carried out on the sinogram 
and hence on the 1D projections acquired after integration along the low-resolution 
direction. Note that only in the high-resolution direction of the anisotropic source, 
the required spatial coherence was provided. 

One particular motivation to develop reconstruction based on the 3DRT is related 
to local or region-of-interest tomography. It can be shown mathematically that the 
reconstruction only depends on the local values of the Radon transformed object 
function, so that artifacts introduced by object components outside the reconstruc- 
tion volume, which often affect the image quality in local tomography of standard 
2DRT, should in principle be suppressed. This, however, could not be confirmed by 
numerical simulations. 


13.6 Tomography of Biological Tissues: Applications 
and Benchmarks 


As is well known from classical histology, physiological function is enabled by the 
underlying tissue structure, and conversely, alterations lead to different pathological 
states, e.g., in neurodegenerative diseases. Deciphering the 3D tissue structure from 
the the whole organ down to the cellular scale enables the quantification of these 
relations and the underlying mechanisms. Conventional approaches as histological 
sectioning or electron microscopy (EM), are associated with serial sectioning, stain- 
ing and subsequent investigation under a light or electron microscope. They provide 
excellent results on single 2D sections, but the 3D anatomy can only be determined 
after aligning the individual sections, leading to a non-isotropic resolution within the 
tissue. Apart from possible artifacts due to the slicing or staining procedure, they are 
labor-intensive and time-consuming techniques, impeding the visualization of large 
fields of view, e.g. entire organs, even at moderate resolution. 

To this end, phase-contrast tomography based on free propagation offers a unique 
capability for high resolution imaging of soft tissues over a cross section of several 
mm, and with a geometric zoom capability to visualize selected regions of interest 
down to 20-50 nm voxel sizes. Zoom-tomography is enabled by variation of the 
focus-to-sample distance, yielding 3D reconstructions at selectable magnification, 
resolution and field of view (FOV). The zoom capability and the dose efficiency 
are particularly pronounced if highly divergent and highly coherent beams with 
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low wavefront distortions are available. Such wavefronts are provided by optimized 
X-ray waveguide optics, as presented in Sect. 13.2. In combination with suitable 
phase-retrieval algorithms, challenging radiation sensitive and low-contrast samples 
can be reconstructed with minimal artifacts. 

In this section, we review phase-contrast X-ray tomography of biological tissues, 
presenting examples and benchmark studies for two different cases. First, phase- 
contrast u-CT using (in-house) laboratory sources, and second, nano-CT using syn- 
chrotron radiation (SR). The first case is illustrated by tomography on the scale of 
small animal organs, notably cochlea [97, 98], as well as tomography at the small 
animal scale [99]. For the second case, we present tomography of nerves from mouse 
(optic nerve, sciatic nerve), showing each axon in the nerve with details such as the 
node of Ranvier and Schmidt-Lantermann incisures [100], lung tissue for asthma and 
control mice [27], and finally high-resolution reconstructions of human cerebellum, 
yielding the precise locations of neurons in the molecular and granular layer [61]. 
The last example comprises both synchrotron nano-CT and laboratory u-CT. 


13.6.1 3D Structure of Cochlea 


Imaging of the delicate and complex anatomy of the cochlea in small animal models 
is required to understand malformations caused by genetic defects, to guide new 
treatments and to develop cochlear implants [101, 102], including novel optogenetic 
approaches [103]. Cochlea imaging is perfectly suited to illustrate the particular 
advantages of phase-contrast X-ray imaging, since soft tissues and membranes have 
to be visualized while surrounded by bone. Phase-contrast tomography of cochleae 
using synchrotron radiation can overcome the limitations of imaging approaches 
such as classical histology or magnetic resonance imaging [102, 104, 105]. Reach- 
ing sufficient contrast and resolution at laboratory sources, however, poses a much 
larger challenge. In [97], a well chosen combination of a liquid-metal jet anode (cf. 
Sect. 13.4), high resolution detectors, an optimized geometry and reconstruction 
algorithms was used to achieve sufficient contrast and resolution down to 2 um, 
enabling the visualization of thin membranes and nerve fibers surrounded by bone. 
Importantly, the high data quality allowed for automatic histogram-based segmen- 
tation between bone and soft tissue. Figure 13.11 illustrates the achieved contrast, 
data quality and resolution for the visualization of thin membranes and nerve fibers 
within the cochlea. 

The presented results show that polychromatic illumination of laboratory X-ray 
sources does not per se impede high data quality. However, the reconstructed grey 
values can by no means be regarded as quantitatively correct. Neither does the phase- 
retrieval approach by [88] properly separate phase from amplitude, nor is phase or 
amplitude well defined in the case of a broad bandpass. Effects of beam hardening 
make even effective values for (mean) photon energy extremely questionable. Such 
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Fig. 13.11 Results of in-house phase-contrast tomography on cochlea. a 3D visualization of a 
mouse cochlea with bone (brown, semi-transparent), basilar membrane (green), Reissner’s mem- 
brane (yellow), Rosenthal’s canal (blue) and optical fiber (gray). b Magnified view showing nerve 
tissue (orange). The nerve fibers of the spiral ganglion pass out between the two layers of the lamina 
spiralis ossea (bottom layer shown in magenta). c, d Slices through regions of interest within the 
cochlea, revealing scala tympani (ST), basilar membrane (BM), scala vestibuli et media (SVM) and 
spiral ganglion (SG) for c cone-beam and d inverse geometry. Finer nerve fibers are resolved in the 
inverse geometry (see inset). Scale bars: 200 um and 20 um (insets). Adapted from [97] 


problems are often particularly noticeable if strongly absorbing materials as metal 
are present in the object. Cochlea implants with wires and electric components in 
the vicinity of soft tissues fall into this category of multiple material objects with 
strong differences in the 6/(-ratio. To find a solution for such applications, a new 
class of narrow-band and compact radiation sources was evaluated in [98], based on 
the interaction of accelerated electrons and laser photons (inverse Compton effect) 
[106]. As a prototype of such sources, the Munich Compact Light Source (MuCLS) 
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generates narrow-band X-ray photons within a continuously tunable energy spectrum 
[107-109], providing a very useful source for phase-contrast imaging, and closing 
a performance gap between conventional laboratory instruments and synchrotron 
facilities. MuCLS data enabled high quality reconstruction of the functional soft 
tissue within guinea pig and marmoset cochleae even in the presence of an electrical 
cochlear implant with metallic components. Figure 13.12 illustrates imaging of a 
guinea pig cochlea, at a resolution in the range of 10 um [98]. The higher and 
tunable photon energy and in particular the narrow bandpass of the MuCLS allows 
in principle for more quantitative reconstruction values (grey levels) than possible 
with conventional laboratory microfocus X-ray sources. 


13.6.2 Small Animal Imaging 


In the next example, we show that not only excised organs, but entire small animals 
are amenable to propagation-based phase-contrast tomography even at compact lab- 
oratory sources. This is important since synchrotron radiation sources are rarely 
in direct vicinity of small animal and biomedical research facilities, and beamtime 
scheduling constraints easily interfere with the requirements of small animal studies. 
Contrarily, amuch wider range of premedical research applications can be addressed 
after translation of phase-contrast tomography to the laboratory scale. 

The chosen example is concerned with in situ 3D lung imaging of small animals. 
Phase contrast had been demonstrated earlier for this application by grating-based 
phase contrast [110, 111], which does not, however, achieve the resolution to resolve 
small features in tissue. With the advent of improved sources, instrumentation and 
analysis, 2D [112] and 3D [99] propagation-based phase-contrast imaging has now 
become practical also at the level of small animals. A suitable strategy, demonstrated 
in [99], is as follows: First, large overview scans are recorded in absorption con- 
trast. Subsequently, by changing according geometric parameters and increasing the 
magnification of the setup, a phase-contrast data set is acquired in local tomogra- 
phy mode. As shown in [99] and illustated in Fig. 13.13, fine terminal airways and 
thousands of small alveoli of the lung can be resolved at a resolution of about 5 um, 
despite the rather thick and absorbing surrounding tissue. 

The results required an optimization of the energetic spectrum by pre-hardening. 
Hence, for future experiments it would be useful to enrich the liquid jet alloys with 
indium or to replace it with a suitable silver alloy. By further improvements resulting 
in a decrease of the acquisition time by a factor of ten, live animal phase-contrast 
imaging would become possible. The high availability of laboratory sources would 
thus enable longitudinal studies, as well as the necessary statistical power. 
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Fig. 13.12 Reconstruction results for a guinea pig cochlea measured at the MuCLS. a Virtual slice 
through the reconstructed 3D volume, showing the typical anatomical features of the cochlea in high 
detail without artifacts like beam hardening. In particular, the Rosenthal’s canal (RC), the osseous 
spiral lamina (OSL) and the stria vascularis (STV) can be recognized. In the inset, in which contrast 
was optimized for the soft tissue components, also the basilar membrane (BM) and the Reissner’s 
membrane (RM) are visible as well as the corresponding chambers separated by these membranes, 
the scala tympani (ST), scala media (SM) and the scala vestibuli (SV). b 3D rendering of part of 
the volume with a cut revealing the inner structure of the cochlea in high detail. c, d Segmentation 
of typical anatomical features of the cochlea together with a volume rendering displayed semi- 
transparently to put it in context. Note that due to rupturing, the theoretical shape of the membranes 
was derived from the position of typical landmarks in the volume. The segmentation includes the 
ossicles (malleus, incus and stapes), the round window membrane (RWM) as well as the OSL, RC, 
RM and BM. Scale bar: 1 mm. Adapted from [98] 
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Fig. 13.13 a Projection of the large-FOV data set that covers the thorax of the whole mouse (top), 
showing mostly absorption contrast, and a phase-contrast projection obtained by zooming into the 
region marked by the rectangle (bottom). b Virtual slice through the large-FOV measurement. The 
inset shows a zoom into the lung area marked by the rectangle with adjusted contrast. ce Virtual slice 
obtained in the zoom configuration setting, where the sample-detector distance was increased to 
obtain phase contrast. d The same slice after the application of the phase retrieval approach proposed 
by Paganin et al. [75]. The signal-to-noise ratio is increased while simultaneously the gray values get 
more quantitative. e Profiles along the 6 pixel wide lines indicated in (c) and (d), respectively. The 
positive effect of phase retrieval is clearly visible. f 3D rendering of the large-FOV measurement 
(left), containing automatically segmented bones (gray), the heart (red) and lung tissue (pink). On 
the right, a 3D rendering of the zoomed reconstruction volume is shown, with orthogonal slices 
through the volume and a rendering of the soft-tissue structure. Scale bars: 2 mm (a, top and b), 
400 um (a, bottom) and 500 um (c, d). Adapted from [99] 
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13.6.3 3D Virtual Histology of Nerves 


A fast conduction of action potentials in specific nerves in the peripheral or central 
nervous system (PNS/CNS) is enabled by myelin sheaths which surround the corre- 
sponding parallel arranged axons within the nerve, leading to an electrical insulation 
against the surrounding fluids. These myelin sheaths comprise myelin segments with 
a length of 150-200 um in the CNS and up to 1 mm in the PNS [113, 114], followed 
by myelin-free gaps called nodes of Ranvier (RN). This segmental structure results in 
saltatory conduction, with the action potentials propagating from one node of Ranvier 
to the next, where a large sodium influx leads to a regeneration of the signal [115]. 
In more or less regular distances within the myelin segments of the PNS, its compact 
structure is interrupted by clefts, the so-called Schmidt-Lanterman (SL) incisures. 

3D virtual histology by phase-contrast tomography offers a unique access to probe 
the spatial organization of the axon bundles within the nerve and to answer ques- 
tions of axon organization and size distribution as well as correlations of RN and 
SL between different axons. In [100], entire (uncut) optic, saphenous and sciatic 
nerves were prepared from mouse using high pressure freezing, and scanned using 
the nanofocus KB optics at legacy beamline ID22NI of ESRF. In subsequent work, 
the recent ESRF upgrade beamline ID16A [43], as well as the upgraded GINIX end- 
station at beamline P10/PETRAIT [44], have been used to demonstrate the suitability 
of these novel setups for nerve tomography [116]. 

It was found that intrinsic electron density without additional labeling or staining 
is sufficient to identify axonal structures. However, to specifically image the myelin 
sheath surrounding the axon, labeling by an osmium tetroxide stain was required. 
By placing the nerve at different defocus positions in the diverging waveguide, both 
overview scans of entire sciatic nerves, as well as zoom tomograms of relevant 
sub-structures as nodes of Ranvier and Schmidt-Lanterman incisures were recorded 
(cf. Fig. 13.14). The reconstructions were found to be very consistent with histology 
sections and EM micrographs, but offered the clear advantage of probing much larger 
volumes that could be visualized with isotropic 3D resolution. 


13.6.4 Macrophages in Lung Tissue 


The lung is the primary organ of the respiratory system in air-breathing vertebrates. 
It enables the oxygen exchange between the inhaled oxygen-rich air and the blood 
in the cardiovascular system of the body. The air is transported through the trachea, 
which branches into many bronchi and bronchioles that eventually end in the alveoli. 
This anatomical structure leads to a continuous surface enlargement, enabling a fast 
exchange of oxygen between the alveoli and surrounding blood vessels. One of 
the major diseases associated with the lung is asthma, with typical symptomps as 
coughing or shortness of breath, the cause and progression of which is still not fully 
understood [117]. It leads to a chronic inflammation of the respiratory tract, especially 
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Fig. 13.14 Phase-contrast tomography of nerves. a Three dimensional visualization of a mouse 
saphenus nerve stained with osmium and embedded in agarose (voxel size 430 nm). The nerve is 
rendered in blue, while an adjacent blood vessel is depicted in red. Additionally, a longitudinal 
virtual slice is shown, revealing the single axons within the nerve due to the high electron density 
of the osmium-stained myelin sheath. b Virtual slice through an EPON-embedded osmium stained 
mouse sciatic nerve measured with a voxel size of 430 nm. e Virtual slice through a zoom-tomogram 
of the same nerve (50 nm voxel size), recorded in the region marked in (b). d 3D rendering of the 
same nerve measured with a voxel size of 100 nm. 20 axons (turquoise) are shown along with a 
virtual slice through the reconstructed volume. Nodes of Ranvier are rendered yellow, Schmidt- 
Lanterman incisures red. An additional rendering of 13 axons (black box) suggests a correlation 
between the positions of these nodes and incisures of neighbouring axons. Scale bars 50 um (a), 
100 um (b) and 10 um (c). Adapted from [100] 


the bronchi and bronchioles. Macrophages, which are part of the immune system, are 
a special kind of phagocytes, protecting the organism by ingesting harmful pathogens 
and other foreign substances. They are known to be involved in processes of allergic 
inflammation [118], but their precise role in asthma and the underlying mechanisms 
are still debated [119], including in particular their migration properties [120]. 

High resolution X-ray phase-contrast tomography is a promising tool to visualize 
the 3D distribution of macrophages in situ. In [27], tissue slices from lungs of mice 
were imaged at the legacy beamline ID22NI of ESRF as well as the GINIX setup with 
voxel sizes in the range of 50—430 nm. In this study, the intricate three-dimensional 
(3D) structure of lung tissue was visualized, with its system of the bronchial tree, alve- 
oli, and blood vessels (see Fig. 13.15). In addition, the distribution of macrophages 
and their migration properties within the lung were investigated by 3D visualiza- 
tion with high resolution and contrast. Precise tracking of alveolar macrophages in 
relation to anatomical structures was enabled by barium-labeling [27, 121]. It was 
shown that the intratracheally applied macrophages (MH-S cell line [122]) localize 
predominantly on alveoli and are able to penetrate the epithelial layer between the 
airway lumen and parenchyma. 
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Fig. 13.15 Phase-contrast tomography on lung tissue. a Virtual slice through the reconstructed 
asthmatic lung tissue obtained at ID22NI (ESRF) with a voxel size of p = 430 nm. Barium sulphate 
particles (black) and fat (white) show a strong density contrast compared to soft tissue. A blood 
vessel (BV) and a bronchial tube (BT) can be identified based on their different wall morphologies. 
b, c Virtual slices through the reconstructed lung tissue from a healthy control measured at the 
GINIX setup (DESY) both in an overview scan (b, voxel size: 245 nm) and a zoom configuration 
(c, voxel size: 52nm). The position of the zoom scan is indicated by a rectangle in (b). d 3D 
rendering of the ID22NI data set together with barium clusters (green), alveolar walls in a small 
ROI (yellow) and part of a blood vessel (purple). e, f 3D visualization of the tomography results 
obtained at the GINIX setup. The 3D renderings show barium aggregates in macrophages (green), 
part of a blood vessel (purple), the bronchial wall (yellow) and the outline of a single macrophage 
(blue). This cell is additionally shown at higher magnification in (f, bottom). Scale bars: 100 um 
(a), 50 um (b) and 10 um (ec). Adapted from [27] 


13.6.5 Neuron Locations in Human Cerebellum 


The cerebellum, which is among other things important for the maintenance of 
upright posture and synergy of movements [123], is located at the back of the brain 
of mammals. Compared to the largest part of the brain, the cerebral cortex, it has 
a significantly higher cell density and contains 80% of the total number of neurons 
within the human brain, despite its relatively small weight of ~ 10% of the total brain 
mass [124]. The cerebellum generally consists of the tightly folded cerebellar cortex, 
comprising three distinct layers, the cell-rich granular layer, the low-cell molecular 
layer and the intermediary mono-cellular Purkinje cell layer, located above white 
matter with a large amount of axon bundles. 

Studying the cytoarchitecture of the cerebellum via propagation-based phase- 
contrast tomography requires an additional contrast enhancement as hydrated 
unstained tissue does not allow for an unambiguous identification of all cells in the 
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densely packed granular layer [47]. Contrary to radiocontrast agents, which specif- 
ically increase contrast in certain features of the sample, e.g., the myelin sheaths 
in mouse nerves or macrophages in the mouse lung, a global contrast enhancement 
can be reached by exchanging the surrounding medium with a medium with lower 
electron density. This makes it possible to examine conventional neuropathologi- 
cal samples as human brain obtained during routine autopsy since these are usually 
embedded in paraffin after fixation. 

In [61], propagation-based phase-contrast tomography was performed on 
paraffin-embedded human cerebellum both at the GINIX endstation and at the lab- 
oratory setup in inverse geometry, providing insights into the 3D cytoarchitecture 
at sub-cellular level (cf. Fig. 13.16). In order to fully exploit the potential of this 
3D virtual histology, a workflow was developed in order to automatically locate the 
small cells in the molecular and granular layer, leading to the segmentation of several 
ten thousands (GINIX) to ~1.8 million of cells (laboratory). This has enabled the 
analysis of spatial organization of neurons in the granular layer, e.g., based on local 
density estimations or pair correlation functions, pointing towards a strong short- 
range order of these cells visible as local clustering (cf. Fig. 13.17). Moreover, the 
availability of the exact cell positions in 3D allows for the precise quantification of 
cellular distributions, revealing an anisotropy in the arrangement of nearest neigh- 
bors within the granular layer which is governed by the principle directions of the 
Purkinje cell layer, a result which would not have been accessible by conventional 
2D histology. 


13.6.6 Outlook: Time-Resolved Phase-Contrast Tomography 


Together with the progress in detector technology, the current upgrades of syn- 
chrotron sources to a multi-bend achromat lattice and the corresponding increase 
in brilliance will offer unique opportunities for time-resolved (dynamic) tomogra- 
phy. In other words, the data acquisition rate f could become high enough to observe 
dynamic processes in biological matter on the micro-scale and in some cases even on 
sub-micron scales. The phase-contrast imaging capabilities presented in this chapter 
could thus be (at least partially) extended from 3D to 4D (time & space) imaging. 
The first question to be asked concerns the temporal sampling required to probe 
the dynamic process of the object. In [125], e.g, phase-contrast tomography was 
performed in vivo on Xenopus laevis embryos, revealing new aspects of their gastru- 
lation over time. The time scale of this development was long enough for the single 
tomographic scans to be considered as static, enabling data reconstruction following 
the classical approach via simple filtered backprojection, and the development of 
the gastrulation was monitored by recording several tomograms with a time lapse of 
~10 min. 

For faster processes, more elaborate and generalized recording or analysis schemes 
have to developed to meet the challenges of dynamic tomography. In the special 
case of cyclic processes such as a beating heart, acquisitions can be gated or trig- 
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Fig. 13.16 Phase-contrast tomography of tissue from human cerebellum, showing reconstructions 
obtained both at the GINIX endstation and the laboratory setup. a The virtual slice through the 
reconstructed volume of the synchrotron data set reveals the interface between the low-cell molecular 
(ML) and cell-rich granular layer (GL), including a cell of the mono-cellular Purkinje cell layer 
(PCL). b Corresponding slice of the laboratory data set, showing the larger volume accessible 
by the laboratory setup while maintaining the resolution required for single cell identification. A 
magnified view of the region marked by the rectangle is shown on the right, corresponding to the 
FOV of the synchrotron data set in (a). c Segmentation of the cells in the granular layer (dark red), 
the molecular layer (light red) and the Purkinje cell layer (shades of gray) with two exemplary 
Purkinje cells shown separately, from front and side view. The segmentation for the granular and 
molecular layer was performed automatically whereas for the Purkinje cell layer, a semi-automatic 
approach was used. d The same segmentation for the laboratory data set. Note that the individual 
Purkinje cells are the same as for the synchrotron data set and that the thick branches of the dendritic 
tree can already be resolved with the laboratory setup. Scale bars: 50 um (a and b, right) and 200 
um (b, left). Adapted from (a) [47] and (b-d) [61] 


gered (hardware or a posteriori software) to cover different phases of the considered 
motion. By combining projections which are recorded at the same state of the motion 
but at different rotation angles, static solutions can be generated for the different time 
points within one cycle, unraveling, e.g., the complex muscle movement during insect 
flight [126]. However, such approaches fail for non-cyclic processes. 

In [127], an approach for the reconstruction of time-resolved processes based 
on filtered backprojection along dynamically curved paths was introduced. It can 
account for non-affine and non-cyclic motion on time scales shorter than the time 
needed for an entire tomogram, provided that the motion model can be estimated. 
The workflow is depicted in Fig. 13.18 for the example of a burning match. In order 
to monitor the burning process, 47 single tomograms with 401 projections each were 
recorded at the TOMCAT beamline (SLS, Villigen, Switzerland) while the match 
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Fig. 13.17 Analysis of cell distributions in the granular layer of human cerebellum, as obtained 
at the synchrotron (left column) and the laboratory setup (right column). a Local cell density 
distribution within the granular layer of the cerebellum. By considering small volumes for the 
computation of the local density, a clustering of cells within this layer can be clearly recognized 
as hotspots in the density distribution. With increasing volume, the differences in cell density are 
vanishing, leading to an almost uniform density distribution within the granular layer. b Angular 
averaged pair correlation function of the cells in the granular layer, revealing two distinct peaks 
at approximately once and twice the cell diameter, which indicates a local clustering of the cells. 
c Angular distribution of nearest neighbors in the granular layer. Note that the data sets were aligned 
with respect to the Purkinje cell layer such that the dendritic tree lies approximately in the xy-plane 
and hence at 0 ~ 90°. The majority of nearest neighbors are clearly distributed in parallel to the 
dendritic tree of the Purkinje cells as hotspots in the angular distribution are visible at 9 ~ 90°. 
Adapted from [61] 
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Fig. 13.18 Four dimensional movie of a burning match. a Exemplary sinogram extracted from all 
18800 recorded projections. The bottom row depicts tomographic reconstructions of the highlighted 
sinogram segments, each consisting of 401 equidistant projections. While the shape, features of the 
wooden structure and the stages of the burning process can be clearly identified, the reconstructions 
show motion artifacts such as ‘streaks’ or non-closed shapes. b By estimating the motion between 
successive time points via optical flow, a better tomographic reconstruction can be carried out by 
backprojection on dynamically curved paths (right). Note that the motion amplitude was increased 
by the factor of 3 for better visibility. c The improved reconstruction quality can be observed in fine 
object details which can be resolved in the image on the right (red arrows), compared to the result 
of a conventional direct filtered backprojection, shown on the left. d Rendered 3D structure of the 
burning match at different time points. Scale bars: 1 mm. Adapted from [127] 


was continuously rotated at a rate of 1.25 Hz. In the exemplary sinogram in (a) at 
the top, the shrinking of the structure as well as a decrease of signal intensity due 
to the burning process can be observed. This process can be approximately depicted 
in 3D by selecting intervals of 401 projections from the sinogram and performing 
standard filtered backprojections, leading to the reconstructions shown in the lower 
row. The shape and features of the wooden structure and the stages of the burning 
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process can be clearly identified. However, the reconstructions show motion artifacts 
such as ‘streaks’ or non-closed shapes. These can be reduced by estimating the 
motion perpendicular to the rotation axis between each subsequent pair of slices in 
the time series via optical flow analysis [128]. This motion model is then used to 
perform filtered backprojection along dynamically curved paths, as depicted in (b), 
accounting for the motion of certain parts of the sample during the time span of 
the corresponding tomogram. The comparison between the reconstructed slice using 
this approach and a standard filtered backprojection in (c) shows that artifacts can 
be significantly reduced, enabling the investigation of the dynamics of the burning 
process at high temporal and spatial resolution in the um-range. The 4D nature of 
the data is illustrated in (d), showing the rendered 3D structure of the wooden part 
of the match at 5 different points in time. 

As this simple example of the burning match shows, time-resolved phase-contrast 
tomography based on advanced reconstruction schemes allows us to observe dynamic 
processes in the interior of biomaterials and biological matter. With further improve- 
ments, 4D reconstructions of such processes in the interior of multi-cellular assem- 
blies and tissues up to the level of entire organs and small animals can be anticipated. 


References 


Nugent, K.A.: Coherent methods in the X-ray sciences. Adv. Phys. 59(1), 1-99 (2010) 

Paganin, D.M.: Coherent X-ray Optics. Oxford University, New York (2006) 

Bonse, U., Hart, M.: An X-ray interferometer. Appl. Phys. Lett. 6(8), 155-156 (1965) 

Chapman, D., Thomlinson, W., Johnston, R.E., Washburn, D., Pisano, E., Gmür, N., Zhong, 

Z., Menk, R., Arfelli, F., Sayers, D.: Diffraction enhanced X-ray imaging. Phys. Med. Biol. 

42(11), 2015 (1997) 

5. Pfeiffer, F., Bech, M., Bunk, O., Kraft, P., Eikenberry, E.F., Bronnimann, C., Grünzweig, 
C., David, C.: Hard-X-ray dark-field imaging using a grating interferometer. Nat. Mater. 7, 
134-137 (2008) 

6. Weitkamp, T., Diaz, A., David, C., Pfeiffer, F., Stampanoni, M., Cloetens, P., Ziegler, E.: 
X-ray phase imaging with a grating interferometer. Opt. Express 13(16), 6296-6304 (2005) 

7. Munro, P.R.T., Ignatyev, K., Speller, R.D., Olivo, A.: Phase and absorption retrieval using 
incoherent x-ray sources. PNAS 109(35), 13922-13927 (2012) 

8. Olivo, A., Speller, R.: A coded-aperture technique allowing X-ray phase contrast imaging 
with conventional sources. Appl. Phys. Lett. 91(7), 074106 (2007) 

9. Zanette, I., Zhou, T., Burvall, A., Lundström, U., Larsson, D.H., Zdora, M., Thibault, P., 
Pfeiffer, F., Hertz, H.M.: Speckle-based X-ray phase-contrast and dark-field imaging with a 
laboratory source. Phys. Rev. Lett. 112(25), 253903 (2014) 

10. Cloetens, P., Ludwig, W., Baruchel, J., Van Dyck, D., Van Landuyt, J., Guigay, J.P., Schlenker, 
M.: Holotomography: quantitative phase tomography with micrometer resolution using hard 
synchrotron radiation X-rays. Appl. Phys. Lett. 75(19), 2912-2914 (1999) 

11. Gureyev, T.E., Davis, T.J., Pogany, A., Mayo, S.C., Wilkins, S.W.: Optical phase retrieval by 
use of first Born- and Rytov-Type approximations. Appl. Opt. 43(12), 2418-2430 (2004) 

12. Krenkel, M., Bartels, M., Salditt, T.: Transport of intensity phase reconstruction to solve the 
twin image problem in holographic X-ray imaging. Opt. Express 21(2), 2220-2235 (2013) 

13. Hagemann, J., Töpperwien, M., Salditt, T.: Phase retrieval for near-field X-ray imaging beyond 

linearisation or compact support. Appl. Phys. Lett. 113(4), 041109 (2018) 


ior 


13 Holographic Imaging and Tomography of Biological Cells and Tissues 371 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


36. 


Krenkel, M., Toepperwien, M., Alves, F., Salditt, T.: Three-dimensional single-cell imaging 
with X-ray waveguides in the holographic regime. Acta Crystallogr. A 73(4), 282-292 (2017) 
Giewekemeyer, K., Krüger, S.P., Kalbfleisch, S., Bartels, M., Beta, C., Salditt, T.: X-ray 
propagation microscopy of biological cells using waveguides as a quasipoint source. Phys. 
Rev. A 83(2), 023804 (2011) 

Pein, A., Loock, S., Plonka, G., Salditt, T.: Using sparsity information for iterative phase 
retrieval in X-ray propagation imaging. Opt. Express 24(8), 8332-8343 (2016) 

Ruhlandt, A., Krenkel, M., Bartels, M., Salditt, T.: Three-dimensional phase retrieval in 
propagation-based phase-contrast imaging. Phys. Rev. A 89, 033847 (2014) 

Chapman, H.N., Nugent, K.A.: Coherent lensless X-ray imaging. Nat. Photon. 4(12), 833-839 
(2010) 

Fienup, J.R.: Reconstruction of a complex-valued object from the modulus of its Fourier 
transform using a support constraint. J. Opt. Soc. Am. A 4(1), 118-123 (1987) 

Marchesini, S.: Invited article: a unified evaluation of iterative projection algorithms for phase 
retrieval. Rev. Sci. Instrum. 78(1), 011301 (2007) 

Miao, J., Charalambous, P., Kirz, J., Sayre, D.: Extending the methodology of X-ray crystal- 
lography to allow imaging of micrometre-sized non-crystalline specimens. Nature 400(6742), 
342-344 (1999) 

Schroer, C.G., Boye, P., Feldkamp, J.M., Patommel, J., Schropp, A., Schwab, A., Stephan, S., 
Burghammer, M., Schoder, S., Riekel, C.: Coherent X-ray diffraction imaging with nanofo- 
cused illumination. Phys. Rev. Lett. 101(9), 090801 (2008) 


. Thibault, P., Dierolf, M., Bunk, O., Menzel, A., Pfeiffer, F.: Probe retrieval in ptychographic 


coherent diffractive imaging. Ultramicroscopy 109(4), 338-343 (2009) 


. Maretzke, S.: A uniqueness result for propagation-based phase contrast imaging from a single 


measurement. Inverse Probl. 31(6), 065003 (2015) 


. Bartels, M., Krenkel, M., Haber, J., Wilke, R.N., Salditt, T.: X-Ray holographic imaging of 


hydrated biological cells in solution. Phys. Rev. Lett. 114, 048103 (2015) 


. Hagemann, J., Robisch, A.L., Osterhoff, M., Salditt, T.: Probe reconstruction for holographic 


X-ray imaging. J. Synchrotron Rad. 24(2), 498-505 (2017) 


. Krenkel, M., Markus, A., Bartels, M., Dullin, C., Alves, F., Salditt, T.: Phase-contrast zoom 


tomography reveals precise locations of macrophages in mouse lungs. Sci. Rep. 5, 09973 
(2015) 


. Mokso, R., Cloetens, P., Maire, E., Ludwig, W., Buffiere, J.: Nanoscale zoom tomography 


with hard X-rays using Kirkpatrick-Baez optics. Appl. Phys. Lett. 90, 144104 (2007) 


. Hagemann, J., Salditt, T.: Coherence-resolution relationship in holographic and coherent 


diffractive imaging. Opt. Express 26(1), 242 (2018) 


. Stangl, J., Mocuta, C., Chamard, V., Carbone, D.: Nanobeam X-ray Scattering: Probing Matter 


at the Nanoscale. Wiley (2013) 


. Hagemann, J., Robisch, A.L., Luke, D.R., Homann, C., Hohage, T., Cloetens, P., Suhonen, 


H., Salditt, T.: Reconstruction of wave front and object for inline holography from a set of 
detection planes. Opt. Express 22(10), 11552-11569 (2014) 


. Homann, C., Hohage, T., Hagemann, J., Robisch, A.L., Salditt, T.: Validity of the empty-beam 


correction in near-field imaging. Phys. Rev. A 91, 013821 (2015) 


. Robisch, A.L., Kröger, K., Rack, A., Salditt, T.: Near-field ptychography using lateral and 


longitudinal shifts. New J. Phys. 17(7), 073033 (2015) 


. Robisch, A.L., Salditt, T.: Phase retrieval for object and probe using a series of defocus 


near-field images. Opt. Express 21(20), 23345-23357 (2013) 


. Robisch, A.L., Wallentin, J., Pacureanu, A., Cloetens, P., Salditt, T.: Holographic imaging 


with a hard X-ray nanoprobe: ptychographic versus conventional phase retrieval. Opt. Lett. 
41(23), 5519-5522 (2016) 

Stockmar, M., Cloetens, P., Zanette, I., Enders, B., Dierolf, M., Pfeiffer, F., Thibault, P.: Near- 
field ptychography: phase retrieval for inline holography using a structured illumination. Sci. 
Rep. 3, 1927 (2013) 


372 


37. 


38. 


39. 


40. 


41. 


42. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


54. 


55. 


56. 


57. 


T. Salditt and M. Töpperwien 


Kewish, C.M., Guizar-Sicairos, M., Liu, C., Qian, J., Shi, B., Benson, C., Khounsary, A.M., 
Vila-Comamala, J., Bunk, O., Fienup, J.R., Macrander, A.T., Assoufid, L.: Reconstruction of 
an astigmatic hard X-ray beam and alignment of K-B mirrors from ptychographic coherent 
diffraction data. Opt. Express 18(22), 23420-23427 (2010) 

Maiden, A.M., Rodenburg, J.M.: An improved ptychographical phase retrieval algorithm for 
diffractive imaging. Ultramicroscopy 109(10), 1256-1262 (2009) 

Marchesini, S., Krishnan, H., Shapiro, D.A., Perciano, T., Sethian, J.A., Daurer, B.J., Maia, 
F.R.N.C.: SHARP: a distributed, GPU-based ptychographic solver. arXiv 1602.01448 (2016) 
Schropp, A., Boye, P., Feldkamp, J.M., Hoppe, R., Patommel, J., Samberg, D., Stephan, S., 
Giewekemeyer, K., Wilke, R.N., Salditt, T., Gulden, J., Mancuso, A.P., Vartanyants, I.A., 
Weckert, E., Schoder, S., Burghammer, M., Schroer, C.G.: Hard X-ray nanobeam characteri- 
zation by coherent diffraction microscopy. Appl. Phys. Lett. 96(9), 091102 (2010) 

Vine, D.J., Williams, G.J., Abbey, B., Pfeifer, M.A., Clark, J.N., de Jonge, M.D., McNulty, I., 
Peele, A.G., Nugent, K.A.: Ptychographic Fresnel coherent diffractive imaging. Phys. Rev. A 
80(6), 063823 (2009) 

Stockmar, M., Zanette, I., Dierolf, M., Enders, B., Clare, R., Pfeiffer, F., Cloetens, P., Bonnin, 
A., Thibault, P.: X-ray near-field ptychography for optically thick specimens. Phys. Rev. Appl. 
3, 014005 (2015) 


. Morawe, C., Barrett, R., Cloetens, P., Lantelme, B., Peffen, J.C., Vivo, A.: Graded multilayers 


for figured Kirkpatrick-Baez mirrors on the new ESRF end station ID16A. Proc. SPIE 9588, 
958803 (2015) 

Salditt, T., Osterhoff, M., Krenkel, M., Wilke, R.N., Priebe, M., Bartels, M., Kalbfleisch, S., 
Sprung, M.: Compound focusing mirror and X-ray waveguide optics for coherent imaging 
and nano-diffraction. J. Synchrotron Rad. 22(4), 867-878 (2015) 

Allen, L.J., Oxley, M.P.: Phase retrieval from series of images obtained by defocus variation. 
Opt. Commun. 199, 65-75 (2001) 

Loetgering, L., Hammoud, R., Juschkin, L., Wilhein, T.: A phase retrieval algorithm based 
on three-dimensionally translated diffraction patterns. Europhys. Lett. 111(6), 64002 (2015) 
Töpperwien, M.: 3d virtual histology of neuronal tissue by propagation-based X-ray phase- 
contrast tomography. Ph.D. thesis, Universitat Gottingen (2018) 

Fuhse, C., Ollinger, C., Salditt, T.: Waveguide-based Off-Axis holography with hard X-rays. 
Phys. Rev. Lett. 97(25), 254801 (2006) 

Jarre, A., Fuhse, C., Ollinger, C., Seeger, J., Tucoulou, R., Salditt, T.: Two-dimensional hard 
X-ray beam compression by combined focusing and waveguide optics. Phys. Rev. Lett. 94(7), 
074801 (2005) 

Osterhoff, M., Salditt, T.: Coherence filtering of X-ray waveguides: analytical and numerical 
approach. New J. Phys. 13(10), 103026 (2011) 


. Pfeiffer, F., David, C., Burghammer, M., Riekel, C., Salditt, T.: Two-dimensional X-ray waveg- 


uides and point sources. Science 297(6), 230 (2002) 


. Salditt, T., Kruger, S.P., Fuhse, C., Bahtz, C.: High-transmission planar X-ray Waveguides. 


Phys. Rev. Lett. 100(18), 184801-184804 (2008) 


. Kriiger, S.P., Giewekemeyer, K., Kalbfleisch, S., Bartels, M., Neubauer, H., Salditt, T.: Sub-15 


nm beam confinement by two crossed X-ray waveguides. Opt. Express 18(13), 13492-13501 
(2010) 

Krüger, S.P., Neubauer, H., Bartels, M., Kalbfleisch, S., Giewekemeyer, K., Wilbrandt, P.J., 
Sprung, M., Salditt, T.: Sub-10 nm beam confinement by X-ray waveguides: design, fabrica- 
tion and characterization of optical properties. J. Synchrotron Rad. 19(2), 227-236 (2012) 
Hoffmann-Urlaub, S., Höhne, P., Kanbach, M., Salditt, T.: Advances in fabrication of X-ray 
waveguides. Microelectron. Eng. 164, 135-138 (2016) 

Neubauer, H., Hoffmann, S., Kanbach, M., Haber, J., Kalbfleisch, S., Krüger, S.P., Salditt, 
T.: High aspect ratio X-ray waveguide channels fabricated by e-beam lithography and wafer 
bonding. J. Appl. Phys. 115(21), 214305 (2014) 

Chen, H.Y., Hoffmann, S., Salditt, T.: X-ray beam compression by tapered waveguides. Appl. 
Phys. Lett. 106(19), 194105 (2015) 


13 Holographic Imaging and Tomography of Biological Cells and Tissues 373 


58 


59. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


71. 


72. 


73. 


74. 


73: 


76. 


77. 


78. 


79. 


Salditt, T., Hoffmann, S., Vassholz, M., Haber, J., Osterhoff, M., Hilhorst, J.: X-ray optics on 
a chip: guiding X-rays in curved channels. Phys. Rev. Lett. 115, 203902 (2015) 

Bartels, M., Priebe, M., Wilke, R.N., Krüger, S., Giewekemeyer, K., Kalbfleisch, S., 
Olendrowitz, C., Sprung, M., Salditt, T.: Low-dose three-dimensional hard X-ray imaging 
of bacterial cells. Opt. Nanoscopy 1(1), 10 (2012) 

Töpperwien, M., Krenkel, M., Müller, K., Salditt, T.: Phase-contrast tomography of neuronal 
tissues: from laboratory-to high resolution synchrotron CT. Proc. SPIE 9967, 99670T (2016) 
Töpperwien, M., van der Meer, F., Stadelmann, C., Salditt, T.: Three-dimensional virtual 
histology of human cerebellum by X-ray phase-contrast tomography. PNAS 115(27), 6940- 
6945 (2018) 

Wilke, R.N., Priebe, M., Bartels, M., Giewekemeyer, K., Diaz, A., Karvinen, P., Salditt, T.: 
Hard X-ray imaging of bacterial cells: nano-diffraction and ptychographic reconstruction. 
Opt. Express 20(17), 19232-19254 (2012) 

Wilke, R.N., Vassholz, M., Salditt, T.: Semi-transparent central stop in high-resolution X-ray 
ptychography using Kirkpatrick-Baez focusing. Acta Crystallogr. A 69(5), 490-497 (2013) 
Maretzke, S.: Locality estimates for fresnel-wave-propagation and stability of near-field X-ray 
propagation imaging with finite detectors. arXiv preprint arXiv:1805.06185 (2018) 
Hagemann, J., Salditt, T.: The fluence-resolution relationship in holographic and coherent 
diffractive imaging. J. Appl. Crystallogr. 50(2), 531-538 (2017) 

Giewekemeyer, K., Thibault, P., Kalbfleisch, S., Beerlink, A., Kewish, C.M., Dierolf, M., 
Pfeiffer, F., Salditt, T.: Quantitative biological imaging by ptychographic X-ray diffraction 
microscopy. PNAS 107(2), 529-534 (2010) 

Wilke, R.N.: Coherent X-ray diffractive imaging on the single-cell-level of microbial samples: 
ptychography, tomography, Nano-diffraction and waveguide-imaging. Ph.D. thesis, Univer- 
sität Göttingen (2014) 

Jahn, T., Wilke, R.N., Chushkin, Y., Salditt, T.: How many photons are needed to reconstruct 
random objects in coherent X-ray diffractive imaging? Acta Crystallogr. A 73(1) (2017) 
Elser, V., Eisebitt, S.: Uniqueness transition in noisy phase retrieval. New J. Phys. 13(2), 
023001 (2011) 

Du, M., Gursoy, D., Jacobsen, C.: Near, far, wherever you are: simulations on the dose effi- 
ciency of holographic and ptychographic coherent imaging. arXiv preprint arXiv: 1908.06770 
(2019) 

Hagemann, J., Salditt, T.: Reconstructing mode mixtures in the optical near-field. Opt. Express 
25(13), 13969-13973 (2017) 

Howells, M.R., Beetz, T., Chapman, H.N., Cui, C., Holton, J.M., Jacobsen, C.J., Kirz, J., 
Lima, E., Marchesini, S., Miao, H., Sayre, D., Shapiro, D.A., Spence, J.C.H., Starodub, 
D.: An assessment of the resolution limitation due to radiation-damage in X-ray diffraction 
microscopy. J. Electron Spectros. Relat. Phenomena 170(1-3), 4-12 (2009) 

Huang, X., Miao, H., Steinbrener, J., Nelson, J., Shapiro, D., Stewart, A., Turner, J., Jacobsen, 
C.: Signal-to-noise and radiation exposure considerations in conventional and diffraction X- 
ray microscopy. Opt. Express 17(16), 13541-13553 (2009) 

Groso, A., Stampanoni, M., Abela, R., Schneider, P., Linga, S., Miiller, R.: Phase contrast 
tomography: an alternative approach. Appl. Phys. Lett. 88, 214104 (2006) 

Paganin, D., Mayo, S.C., Gureyev, T.E., Miller, P.R., Wilkins, S.W.: Simultaneous phase and 
amplitude extraction from a single defocused image of a homogeneous object. J. Microsc. 
206(Pt 1), 33-40 (2002) 

Nugent, K.A., Gureyev, T.E., Cookson, D.F., Paganin, D., Barnea, Z.: Quantitative phase 
imaging using Hard X-rays. Phys. Rev. Lett. 77(14), 2961-2964 (1996) 

Hagemann, J.: X-ray near-field holography: beyond idealized assumptions of the probe. Ph.D. 
thesis, Universität Göttingen (2017) 

Maretzke, S.: Regularized Newton methods for simultaneous Radon inversion and phase 
retrieval in phase contrast tomography. arXiv preprint arXiv: 1502.05073 (2015) 

Maretzke, S., Bartels, M., Krenkel, M., Salditt, T., Hohage, T.: Regularized Newton methods 
for X-ray phase contrast and general imaging problems. Opt. Express 24(6), 6490-6506 
(2016) 


374 


80. 
81. 


82. 


83. 


84. 


85. 


86. 


87. 


88. 


89. 


90. 


91. 


92. 


93. 


94. 


95. 


96. 


97. 


98. 


99. 


100. 


101. 


102. 


T. Salditt and M. Töpperwien 


Fienup, J.R.: Phase retrieval algorithms: a comparison. Appl. Opt. 21(15), 2758-2769 (1982) 
Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algo- 
rithms. Opt. Lett. 33(2), 156-158 (2008) 

Turner, L., Dhal, B., Hayes, J., Mancuso, A., Nugent, K., Paterson, D., Scholten, R., Tran, 
C., Peele, A.: X-ray phase imaging: demonstration of extended conditions for homogeneous 
objects. Opt. Express 12(13), 2960-2965 (2004) 

Wu, X., Liu, H.: Clarification of aspects in in-line phase-sensitive X-ray imaging. Med. Phys. 
34(2), 737-743 (2007) 

Rigaku: Microfocus rotating anode X-ray generator (2018). https://www.rigaku.com/en/ 
products/protein/micromax007 

Hemberg, O., Otendal, M., Hertz, H.M.: Liquid-metal-jet anode electron-impact X-ray source. 
Appl. Phys. Lett. 83(7), 1483-1485 (2003) 

Gureyev T.E., Nesterets, Y.I., Stevenson, A.W., Miller, P.R., Pogany, A., Wilkins, S.W.: Some 
simple rules for contrast, signal-to-noise and resolution in in-line X-ray phase-contrast imag- 
ing. Opt. Express 16(5), 3223-3241 (2008) 

Reichardt, M., Frohn, J., Töpperwien, M., Nicolas, J.D., Markus, A., Alves, F., Salditt, T.: 
Nanoscale holographic tomography of heart tissue with X-ray waveguide optics. Proc. SPIE 
10391, 1039105 (2017) 

Witte, Y.D., Boone, M., Vlassenbroeck, J., Dierick, M., Hoorebeke, L.V.: Bronnikov-aided 
correction for x-ray computed tomography. J. Opt. Soc. Am. A 26(4), 890-894 (2009) 
Töpperwien, M., Krenkel, M., Quade, F., Salditt, T.: Laboratory-based X-ray phase-contrast 
tomography enables 3D virtual histology. Proc. SPIE 9964, 996401 (2016) 

Bartels, M.: Cone-beam X-ray phase contrast tomography of biological samples: optimization 
of contrast, resolution and field of view. Ph.D. thesis, Universität Göttingen (2013) 

Krenkel, M.: Cone-beam x-ray phase-contrast tomography for the observation of single cells 
in whole organs. Ph.D. thesis, Universität Göttingen (2015) 

Feldkamp, L.A., Davis, L.C., Kress, J.W.: Practical cone-beam algorithm. J. Opt. Soc. Am. 
A 1(6), 612—619 (1984) 

van Aarle, W., Palenstijn, W.J., Cant, J., Janssens, E., Bleichrodt, F., Dabravolski, A., De 
Beenhouwer, J., Batenburg, K.J., Sijbers, J.: Fast and flexible X-ray tomography using the 
ASTRA toolbox. Opt. Express 24(22), 25129-25147 (2016) 

van Aarle, W., Palenstijn, W.J., De Beenhouwer, J., Altantzis, T., Bals, S., Batenburg, K.J., 
Sijbers, J.: The ASTRA Toolbox: a platform for advanced algorithm development in electron 
tomography. Ultramicroscopy 157, 35-47 (2015) 

Vassholz, M., Koberstein-Schwarz, B., Ruhlandt, A., Krenkel, M., Salditt, T.: New X-ray 
tomography method based on the 3D radon transform compatible with anisotropic sources. 
Phys. Rev. Lett. 116, 088101 (2016) 

Lohse, L.M., Vassholz, M., Salditt, T.: Tomography with extended sources: theory, error 
estimates, and a reconstruction algorithm. Phys. Rev. A 96(6), 063804 (2017) 

Bartels, M., Hernandez, V.H., Krenkel, M., Moser, T., Salditt, T.: Phase contrast tomography 
of the mouse cochlea at microfocus X-ray sources. Appl. Phys. Lett. 103(8), 083703 (2013) 
Töpperwien, M., Gradl, R., Keppeler, D., Vassholz, M., Meyer, A., Hessler, R., Achterhold, K., 
Gleich, B., Dierolf, M., Pfeiffer, F., Moser, T., Salditt, T.: Propagation-based phase-contrast 
X-ray tomography of cochlea using a compact synchrotron source. Sci. Rep. 8, 4922 (2018) 
Krenkel, M., Töpperwien, M., Dullin, C., Alves, F., Salditt, T.: Propagation-based phase- 
contrast tomography for high-resolution lung imaging with laboratory sources. AIP Adv. 
6(3), 035007 (2016) 

Bartels, M., Krenkel, M., Cloetens, P., Möbius, W., Salditt, T.: Myelinated mouse nerves 
studied by X-ray phase contrast zoom tomography. J. Struct. Biol., 561-568 (2015) 
Lareida, A., Beckmann, F., Schrott-Fischer, A., Glueckert, R., Freysinger, W., Müller, B.: 
High-resolution X-ray tomography of the human inner ear: synchrotron radiation-based study 
of nerve fibre bundles, membranes and ganglion cells. J. Microsc. 234(1), 95-102 (2009) 
Rau, C., Robinson, I.K., Richter, C.P.: Visualizing soft tissue in the mammalian cochlea with 
coherent hard X-rays. Microsc. Res. Tech. 69(8), 660-665 (2006) 


13 Holographic Imaging and Tomography of Biological Cells and Tissues 375 


103. 


104. 


105. 


106. 
107. 


108. 


109. 


110. 


111. 


112. 


121. 


Hernandez, V.H., Gehrt, A., Reuter, K., Jing, Z., Jeschke, M., Schulz, A.M., Hoch, G., Bartels, 
M., Vogt, G., Garnham, C.W., et al.: Optogenetic stimulation of the auditory pathway. J. Clin. 
Investig. 124(3), 1114-1129 (2014) 

Rau, C., Hwang, M., Lee, W.K., Richter, C.P.: Quantitative X-ray tomography of the mouse 
cochlea. PLoS ONE 7(4), e33568 (2012) 

Richter, C.P., Shintani-Smith, S., Fishman, A., David, C., Robinson, I., Rau, C.: Imaging of 
cochlear tissue with a grating interferometer and hard X-rays. Microsc. Res. Tech. 72(12), 
902-907 (2009) 

Huang, Z., Ruth, R.D.: Laser-electron storage ring. Phys. Rev. Lett. 80, 976-979 (1998) 
Achterhold, K., Bech, M., Schleede, S., Potdevin, G., Ruth, R., Loewen, R., Pfeiffer, F.: 
Monochromatic computed tomography with a compact laser-driven X-ray source. Sci. Rep. 
3, 1313 (2013) 

Eggl, E., Schleede, S., Bech, M., Achterhold, K., Loewen, R., Ruth, R.D., Pfeiffer, F.: X-ray 
phase-contrast tomography with a compact laser-driven synchrotron source. PNAS 112(18), 
5567-5572 (2015) 

Gradl, R., Dierolf, M., Hehn, L., Giinther, B., Yildirim, A.O., Gleich, B., Achterhold, K., 
Pfeiffer, F., Morgan, K.S.: Propagation-based Phase-contrast X-ray imaging at a compact 
light source. Sci. Rep. 7, 4908 (2017) 

Schleede, S., Meinel, F.G., Bech, M., Herzen, J., Achterhold, K., Potdevin, G., Malecki, 
A., Adam-Neumair, S., Thieme, S.F., Bamberg, F., Nikolaou, K., Bohla, A., Yildirim, A.Ö., 
Loewen, R., Gifford, M., Ruth, R., Eickelberg, O., Reiser, M., Pfeiffer, F.: Emphysema diagno- 
sis using X-ray dark-field imaging at a laser-driven compact synchrotron light source. PNAS 
109(44), 17880-17885 (2012) 

Yaroshenko, A., Meinel, F.G., Bech, M., Tapfer, A., Velroyen, A., Schleede, S., Auweter, S., 
Bohla, A., Yildirim, A.Ö., Nikolaou, K., Bamberg, F., Eickelberg, O., Reiser, M.F., Pfeiffer, 
F.: Pulmonary emphysema diagnosis with a preclinical small-animal X-ray dark-field scatter- 
contrast scanner. Radiology 269(2), 427—433 (2013) 

Larsson, D.H., Lundström, U., Westermark, U.K., Arsenian Henriksson, M., Burvall, A., 
Hertz, H.M.: First application of liquid-metal-jet sources for small-animal imaging: high- 
resolution CT and phase-contrast tumor demarcation. Med. Phys. 40(2), 021909 (2013) 


3. Kirschner, D.A., Blaurock, A.E.: Organization, Phylogenetic Variations, and Dynamic Tran- 


sitions of Myelin. CRC Press (1992) 


. Siegel, G.J.: Basic Neurochemistry: Molecular, Cellular and Medical Aspects. Elsevier Aca- 


demic Press (2006) 


. Baumann, N., Pham-Dinh, D.: Biology of oligodendrocyte and myelin in the mammalian 


central nervous system. Physiol. Rev. 81, 871-927 (2001) 


. Töpperwien, M., Krenkel, M., Ruhwedel, T., Möbius, W., Pacureanu, A., Cloetens, P., Salditt, 


T.: Phase-contrast tomography of sciatic nerves: image quality and experimental parameters. 
J. Phys: Conf. Ser. 849(1), 012001 (2017) 


. Martinez, F.D.: Genes, environments, development and asthma: a reappraisal. Eur. Respir. J. 


29(1), 179-184 (2007) 


. Moreira, A.P., Hogaboam, C.M.: Macrophages in allergic asthma: fine-tuning their pro-and 


anti-inflammatory actions for disease resolution. J. Interf. Cytok. Res. 31(6), 485-491 (2011) 


. Balhara, J., Gounni, A.S.: The alveolar macrophages in asthma: a double-edged sword. 


Mucosal Immunol. 5(6), 605 (2012) 


. Mizue, Y., Ghani, S., Leng, L., McDonald, C., Kong, P., Baugh, J., Lane, S.J., Craft, J., 


Nishihira, J., Donnelly, S.C., et al.: Role for macrophage migration inhibitory factor in asthma. 
PNAS 102(40), 14410-14415 (2005) 

Dullin, C., dal Monego, S., Larsson, E., Mohammadi, S., Krenkel, M., Garrovo, C., Biffi, 
S., Lorenzon, A., Markus, A., Napp, J., Salditt, T., Accardo, A., Alves, F., Tromba, G.: 
Functionalized synchrotron in-line phase-contrast computed tomography: a novel approach 
for simultaneous quantification of structural alterations and localization of barium-labelled 
alveolar macrophages within mouse lung samples. J. Synchrotron Rad. 22(1), 143—155 (2015) 


376 


122. 


123. 


124. 


125. 


126. 


127. 


128. 


T. Salditt and M. Töpperwien 


Mbawuike, I.N., Herscowitz, H.B.: MH-S, a murine alveolar macrophage cell line: morpho- 
logical, cytochemical, and functional characteristics. J. Leukoc. Biol. 46(2), 119-127 (1989) 
Siegel, A., Sapru, H.N.: Essential Neuroscience. Point (Lippincott Williams & Wilkins). 
Wolters Kluwer Health/Lippincott Williams & Wilkins (2011) 

Azevedo, F.A.C., Carvalho, L.R.B., Grinberg, L.T., Farfel, J.M., Ferretti, R.E.L., Leite, R.E.P., 
Lent, R., Herculano-Houzel, S., et al.: Equal numbers of neuronal and nonneuronal cells make 
the human brain an isometrically scaled-up primate brain. J. Comp. Neurol. 513(5), 532-541 
(2009) 

Moosmann, J., Ershov, A., Altapova, V., Baumbach, T., Prasad, M.S., LaBonne, C., Xiao, X., 
Kashef, J., Hofmann, R.: X-ray phase-contrast in vivo microtomography probes new aspects 
of Xenopus gastrulation. Nature 497(497), 374-377 (2013) 

Walker, S.M., Schwyn, D.A., Mokso, R., Wicklein, M., Miiller, T., Doube, M., Stampanoni, 
M., Krapp, H.G., Taylor, G.K.: In Vivo time-resolved microtomography reveals the mechanics 
of the blowfly flight motor. PLoS Biol. 12(3), e1001823 (2014) 

Ruhlandt, A., Töpperwien, M., Krenkel, M., Mokso, R., Salditt, T.: Four dimensional material 
movies: high speed phase-contrast tomography by backprojection along dynamically curved 
paths. Sci. Rep. 7(1), 6487 (2017) 

Liu, C., et al.: Beyond pixels: exploring new representations and applications for motion 
analysis. Ph.D. thesis, Massachusetts Institute of Technology (2009) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter’s Creative 


Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 14 A) 
Constrained Reconstructions in X-ray crest 
Phase Contrast Imaging: Uniqueness, 

Stability and Algorithms 
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Abstract This chapter considers the inverse problem of X-ray phase contrast imag- 
ing (XPCI), as introduced in Chap. 2. It is analyzed how physical a priori knowledge, 
e.g. of the approximate size of the imaged sample (support knowledge), affects the 
inverse problem: uniqueness and—for a linearized model—even well-posedness are 
shown to hold under support constraints, ensuring stability of reconstruction from 
real-world noisy data. In order to exploit these theoretical insights, regularized New- 
ton methods are proposed as aclass of reconstruction algorithms that flexibly incorpo- 
rate constraints and account for the inherent nonlinearity of XPCI. A Kaczmarz-type 
variant of the approach is considered for 3D image-recovery in tomographic XPCI, 
which remains applicable for large-scale data. The relevance of constraints and the 
capabilities of the proposed algorithms are demonstrated by numerical reconstruction 
examples. 
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14.1 Forward Models 


We aim to describe (propagation-based) X-ray phase contrast imaging (XPCI) in the 
language of inverse problems. To this end, we deduce forward operators F : X — Y, 
that model the dependence of the measured near-field diffraction patterns (called 
holograms) I € Y from the sample-characterizing parameters f € X (the sought 
image). Different models F are obtained for various settings of practical interest, 
including X-ray phase contrast tomography (XPCT). 
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ave i itted \ hologra 


Fig. 14.1 Basic physical model of XPCI: incident plane waves scatter on the imaged object, that is 
parametrized by a spatially varying refractive index n. The resulting diffraction-pattern (hologram) 
is recorded in the optical near-field at some distance behind the sample 


14.1.1 Physical Model and Preliminaries 


The basic physical model of XPCI is detailed in Chap. 2 and summarized by Fig. 14.1: 
incident monochromatic X-rays, modeled by plane waves, are scattered by the imaged 
sample, that is parametrized by its spatially varying refractive index n(x, z) = 1 — 
d(x, z) + iB(x, z) (ô, B: refractive- and absorption decrement). By the scattering- 
interaction, a perturbation (the image) is imprinted upon the transmitted X-ray wave- 
field. The intensity / of the perturbed wave-field is recorded by a detector placed at 
a finite distance d > 0 behind the sample. 

As derived in Sect.2.1, the dependence of the hologram-intensities J from the 
sample-parameters ô and ß is given by 


I(x) = |W(x, d)? = |D (exp (u —id)) (x)? foral =x € R’, 


$e) =k i enis er / Ble, z) de. (14.1) 
R R 


The phase- and absorption-images & and u are 2D-projections of the 3D- 
densities ô and 8 (k: X-ray wavenumber) along the incident z-direction. The Fresnel- 
propagator D, modeling free-space propagation of the X-rays between object and 
detector, is defined by 


D(f) F (m: F'(f)) with m€) := exp (-iE?/(2P). (14.2) 
Here, F(f)(€) := (27) ™/? fem exp(-iE - x) f(x)dx is the Fourier-transform 


and f = kb? /d > 0 is the modified! Fresnel-number associated with the physical 
length b, that is identified with 1 in the chosen dimensionless coordinates. 


! The classical Fresnel-number is given by fp) := b? / (Ad) = f/ (2r). However, using the parameter 
f is notationally more convenient as it avoids excessive occurence of 27-factors. 
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XPClI-experiments provide intensity data J of the form (14.1) (up to data errors), 
whereas the images @, jz are the quantities of interest. Hence, the following principal 
inverse problem has to be solved: 


Inverse Problem 1 (XPCI) For some set A, reconstruct a 2D-image h = 
p+ id € A from measured holograms I of the form (14.1). 


By rotating the object in Fig. 14.1, holograms /g, may be acquired for different 
incident directions 0; € S? = {x € R? : |x| = 1} of the X-rays onto the sample (in 
Fig. 14.1, the incident direction coincides with the z-axis). This is the setting of X- 
ray phase contrast tomography (XPCT). A mathematical model will be provided 
in Sect. 14.1.3. XPCT allows to probe 3D-variations of the parameters ô, 8 beyond 
mere projections @, u. 


Inverse Problem 2 (XPCT) For some set A, recover a 3D-image f = kG + 
iké € A from holograms {Ig,} measured under different incident directions 
{9;} cS. 


14.1.1.1 A Priori Constraints 


The set of admissible images A in inverse Problems | and 2 is highly relevant. In 
order to facilitate and stabilize image reconstruction, the set A should be restricted 
as far as possible by available physical a priori knowledge: 


Support constraints: real-world samples are of finite size. This implies that the 
functions f € {d, u, ô, 8} : R” — R have a compact support, i.e. are identically 
zero outside some bounded object-domain 2 C R". 

Non-negativity: by the physics of hard X-rays, the decrements ô, d—and thus also 
&, „—are always non-negative. 

Pure phase object: especially for biological samples, (3 and p are typically orders 
of magnitude smaller than ô and ¢. Assuming a purely shifting-, i.e. non-absorbing 
object 3, u = 0, is then a good approximation. 

Homogeneous objects: as is rigorously true for samples composed of a single 
material, proportionality of ô and 8 [ġ and u] may often be assumed. 

Regularity: realistic images &, u, ô, 3 are not arbitrarily singular functions, but 
typically have some characteristic smoothness properties. 

Tomographic consistency: Images & and u that arise as tomographic projections 
of one object under different incident directions are correlated. 


Focussing on support-knowledge, we study the role of such constraints on inverse 
Problems 1 and 2 and outline how to exploit them algorithmically. 
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14.1.1.2 Additional Notation 


We study inverse Problems 1 and 2 in spaces of square-integrable functions: 
L?(R") ={f:R" > C: | fie <} If: al |f(x)|?dx (14.3) 
pR” 


The focus lies on functions f € L? (R”) that have compact support supp( f), i.e. 
that vanish outside some bounded domain 2 C R”: 


supp(f) C 2 :> flame =0, (14.4) 


f|z denotes the restriction of f to B C R”, defined by f|g(x) = f(x) ifx € B 
and f|s(x) = 0 otherwise. For 2 C R”, we write 


L?(Q2) = {f € L?(R”) : supp(f) C 2}. (14.5) 
Furthermore, we define spaces of real-valued L?-functions: 
L*(Q,R) = {f € L’ (2) : Im(f) = 0}, (14.6) 


where Re(-), Im(-) denote the real- and imaginary parts, respectively. 


14.1.2 Forward Operators for XPCI 


Based on Sect. 14.1.1, we introduce forward maps F : X — Y modeling different set- 
tings of XPCI. Note that we define the maps in arbitrary dimensions m € {1,2,3,...} 
although the natural case are images and holograms in m = 2 dimensions. The benefit 
of this will be seen in Sect. 14.3.4.1. 


14.1.2.1 General Nonlinear Forward Operator 


The most general (and most challenging) XPCI-setting is the reconstruction of both 
phase ¢ and absorption u from a single hologram. According to (14.1), this setting 
is modeled by the forward map 


N (h) = I — 1 = |D (exp (—h))|? — 1, (14.7) 
for complex-valued images h = u + id. Note that the constant background intensity 


1 has been subtracted, such that „7 (0) = 0. As a benefit, .Y can be analyzed as an 
operator on L?-spaces: for any bounded 2 C R”, 
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N : LQZ) > L?(R") (14.8) 


is essentially” a well-defined, nonlinear operator. Moreover, it can be shown [1, 2] 
that .Y is continuously Fréchet-differentiable, i.e. sufficiently smooth to admit local 
linear approximations. The derivative is given by 


N 'LfIh = —2Re(D (exp (— f)) - D (exp (— f) - h)). (14.9) 


14.1.2.2 Linearized Forward Map and Contrast-Transfer-Functions 


The nonlinearity of the forward map -/ causes difficulties in both analysis and 
practical image reconstruction. It is therefore standard [3-7] to resort to a lineariza- 
tion valid for weakly scattering samples (see e.g. [7] for details on the regime-of- 
validity): the idea is that the image f is sufficiently “small” so that higher-order 
terms are negligible: 


N (h) = Th) + Oh’) = 7(h) with F(h):= —2Re(D(h)). (14.10) 
The linearized forward map 7 = .Y’[0] is also known as the contrast-transfer- 


function- (CTF-)model, which refers to the following alternate form (compare with 
Sect. 2.2): 


opens cht (IER le? 
(-u - 16) = —2F sin OF F(@) + cos OF F(u) (14.11) 
— <<" 
=:so (£) =:co (£) 


According to (14.11), the linearized contrast in Fourier-space is given by a super- 
position of the Fourier-transforms of phase- and absorption-image &, u modulated 
by the oscillatory CTFs so and co, respectively. 

As |so(€)|, |co(€)| < 1 forall E € R”, 7 : L?(R”) > L?(R”) is abounded (R)- 
linear operator with |Z (h)|| 22 < 2\|hl|z2 for all h € L?(R”). 


14.1.2.3 Homogeneous Objects and Pure Phase Objects 


The cases of homogeneous objects and pure phase objects, see Sect. 14.1.1.1, may be 
treated in a unified manner, by expressing the complex-valued image h = u +i¢ = 
ie~'” yin terms of a single real-valued function p and a parameter v = arctan(3/d) € 
[0; 7/2) (v = 0: pure phase object). 

Such a homogeneity-constraint may be incorporated into the general forward 
model, via a modified forward map 


?To ensure well-definedness on the whole space L?({2), the exponential has to be suitably truncated 
for the physically irrelevant case of negative absorption Re(f) = u < 0. 
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N: L(2,R) > LR"); 
pr N Ge” p) = |D (exp(-ie””p)) |? =i, (14.12) 


The linearized model under a homogeneity constraint may be expressed via a 
single CTF s,(€) := sin (I£? /2P + v): for p € L? (R” , R), it holds that 


Fe) = —2F (sy - F(p)) = Tlie” op). (14.13) 


Although (14.13) only holds for real-valued y, we define .Y, : L?(R”) > 
L? R”) (Z| = 2) on general L?-spaces. For its properties, it is widely irrele- 
vant if real- or complex-valued functions are considered, as .~, commutes with the 
pointwise real-part: Re (.Z,(h)) = .%,(Re (h)) for all h € L?(R"). 


14.1.2.4 Multiple Holograms 


In order to obtain richer data in XPCI, it is standard to acquire multiple holo- 
grams J, Ia,...,/e at several object-to-detector-distances, corresponding to dif- 
ferent Fresnel-numbers f1, fo,..., fe. This may be modeled by combining the for- 
ward maps for the individual holograms F; : X — L?(R”); hr l= 15 Fre 


(nOD, NEP Zhi, GS: 4 to a “vector-valued” operator: 


Foto : X > LR”); he (Fi(h),..., Fah) (14.14) 


14.1.3 Forward Operators for XPCT 


In X-ray phase contrast tomography (XPCT), holograms are measured under dif- 
ferent incident directions 0 € S?. According to the basic model (14.1), the resulting 
intensities /g are then given by 


Ig = |D (exp (— Po (kB + ikd))) (x) |", (14.15) 


where “6 is the parallel-beam projector along 0 (0 Ln, L ny L 0): 


P(X, y) = f fin, + yny +z0)dz, x,y €R, (14.16) 
R 


According to the standard theory of computed tomography, projection-data 
{Po(f)}ece for a suitable set of incident-directions © allows to reconstruct the 
underlying 3D-function f : R? — C. Analogously, the goal of XPCT is to recon- 
struct 3D-variations of the decrements ô and 8 of the sample’s refractive index from 
a tomographic series of holograms {J@}gce. 
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Composition of the projectors Ag with any of the forward maps F € {V,.%, 7, Po}: 
X — L?(R”) from Sect. 14.1.2 induces a corresponding XPCT-model: for © = 
{0,,..., ,}, the tomographic hologram-data is modeled by 


Fecr : f + (F (PoC) )gco = (lo — N) aco- (14.17) 


14.2 Uniqueness Theory 


In practice, it is highly relevant whether the measured intensity data 7 uniquely 
determines the sought image h = 4+ id (or f = kp + ikö in XPCT). Otherwise, 
it might happen that two structurally different samples are indistinguishable by the 
imaging method, which is not desirable. (Non-)uniqueness of an inverse problem is 
equivalent to (non-)injectivity of the governing forward operator F : X — Y.Hence, 
it depends on different aspects: 


1. The richness of the data, i.e. the size of the data-space Y: for example, it is com- 
monly argued that measuring several holograms /,, D, ... at different Fresnel- 
numbers (see Sect. 14.1.2.4) helps to ensure uniqueness in XPCI. 

2. Available a priori knowledge, i.e. the size of the object-space X: the smaller 
X the more likely it is that any two images hı, h2 € X with hı Æ hz induce 
distinguishable data F (h1) Æ F (h2). 


In addition, it may happen that the nonlinear forward model is unique but its lin- 
earization is non-unique or vice verser. Accordingly, the different forward models 
from Sect. 14.1.2 have to be investigated individually. 


14.2.1 Preliminary Results and Counter-Examples 


We first review some known results on (non-)uniqueness of XPCI. Firstly, image 
reconstruction from a single hologram is generally non-unique: 


e Linearized model: 7 : L?(R”) > L?(R”); h > —2Re(h) has a huge null-space 
composed of all h for which D(h) is purely imaginary-valued: 


kern(.7) := {h € L? R”) : Z (h) = 0} = D7! (iL?(R”, R)) (14.18) 


e Nonlinear model (example from [8]): Images h+ : R? \ {0} > C; x e a(|x|) + 
iv arctan2(x) for v € N and smooth functions a : R>ọ > R give rise to so-called 
phase-vortices in the wave-field. The sign of the vortex is not determined by 
Fresnel-intensities (A := exp(—a)): 
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|D(exp(-h))|” = [D(A - exp(—iv arctan2(-))|? 
= |D(A - exp(iv arctan2(-))|? = |D(exp(-h_))|? (14.19) 


Based on these negative results, it is typically argued that at least two holograms 
and/or a homogeneity-constraint are required for uniqueness. Indeed, the situation 
improves substantially in the latter settings: 


e Uniqueness under homogeneity-constraints (linear): the operator A, : L?(IR”) > 
L?(R”) from Sect. 14.1.2.3 is injective, as the zero-manifolds of the Fourier- 
multiplier s, are sets of the Lebesgue-measure 0 in R”. 

e Uniqueness for two holograms (linear): in [9], it is shown by a similar argu- 
ment based on the CTF-representation (14.11) that also the operator Fi) ; 
L?(R”) — L?(R”")? (see Sect. 14.1.2.4) is injective for fı A f2. 


Moreover, it is argued in [9] that both results carry over to the nonlinear model, pro- 
vided that the image h is compactly supported. Indeed, a much stronger uniqueness 
result holds true under such an assumption, as will be seen in the following. 


14.2.2 Sources of Non-uniqueness—The Phase Problem 


According to the basic physical model (14.1), image-formation mathematically 
amounts to three operations: pointwise exponential, h +> exp(—h), Fresnel- 
propagation, exp(—h) > D(exp(—h)), and computation of the pointwise squared 
modulus, D(exp(—/h)) > |D(exp(—h))|°. Among those, D is an invertible opera- 
tion, i.e. does not destroy information. This is not true for the other two operations, 
which give rise to different sources of non-uniqueness: 


e Phase-wrapping: The exponential is 2r-periodic in the imaginary-part of its argu- 
ment. Hence, the phase-image ¢ = Im(h) may only be determined by the data 
up to increments by multiples of 2r. 

e Phase problem: The squared modulus, arising from the restriction of X-ray detec- 
tors to measuring intensities, eliminates the phase-information. 


The first aspect is simpler to analyze and often turns out to be of lesser practical 
impact in XPCI: for moderately strongly scattering samples, ¢ is a priori known to 
assume values within [0; 27), so that non-uniqueness due to phase-wrapping is not 
an issue. In the following, we therefore focus on possible ambiguities due to the 
phase problem. 


14.2.3 Relation to Classical Phase Retrieval Problems 


Up to possibly remaining phase-wrapping ambiguities, the image reconstruction 
problem in XPCI may be rephrased as follows: 
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Given data / = ID(O)/’, reconstruct the object-transmission-function (OTF) 
O := exp(—h) € A from some admissible set A. 


Such settings are known as phase retrieval problems as recovering O is equivalent 
to retrieving the missing phase of D(O) (and then inverting D). Uniqueness of phase 
retrieval has been extensively studied ever since the pioneering works of Walther [10] 
and Akutowicz [11, 12], primarily for the case where D is replaced by the Fourier- 
transform F, i.e. for the reconstruction from phaseless Fourier-data. We refer to 
[13-17] for reviews. 

Indeed, Fresnel-data may be readily reduced to the classical Fourier-setting, by 
rewriting the Fresnel-propagator in the form 


D(f)(x) = uof? ny (x) - F (ny - f) Gfx) forall x eR” (14.20) 


withn;(x) = exp(ifx?/2) and ug = exp(—im7/4). Hence, if we define O:= nj: O, 
then the holograms in XPCI provide Fourier-data for Õ: 


I(€/f) = [DONED = f"F(O)(E) forall eR”. (14.21) 
Based on the identification in (14.21), uniqueness results for Fourier-phase 
retrieval may be adapted to the Fresnel-regime. Notably, however, most of such 


uniqueness theorems assume a compact support of the objective. Importantly, this is 
not justified in the setting of XPCI: 


The OTF O is not a compactly supported function in any realistic setting. Only 
the contrast o := O — 1 typically has compact support. 


14.2.4 Holographic Nature of Phase Retrieval in XPCI 


In order to emphasize the structural difference to classical phase retrieval problems, 
it is illustrative to rewrite the XPCI-model in the form 


I = |D(0)|” = |D(1) + D(o)|* = 1 + 2Re(D(0)) + ID), (14.22) 
— 


=D(0)+D (0) 
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where it has been used that D maps constant functions onto themselves.” According 
to the physical model from Sect. 14.1.1, the summands on the r.h.s. of (14.22) can be 
interpreted in terms of the scattered- and transmitted parts of the X-ray wave-field: 
the constant | is the intensity of the incident plane wave and the last summand that of 
the waves scattered by the object, whereas the second term describes the interference 
of these two wave-field components on the detector. 

Formula (14.22) places the inverse problem of XPCI in the realm of holographic 
phase retrieval problems, i.e. reconstruction in the presence of a reference signal— 
here provided by the unscattered part of the incident X-rays. Several theoretical 
and practical works have shown that such a holographic reference facilitates phase 
retrieval, see e.g. [18-21]. 


14.2.5 General Uniqueness Under Support Constraints 


According to (14.22), image reconstruction in XPCI is equivalent to retrieving o = 
exp(—h) — 1 from data of the form (14.22) (up to possible phase-wrapping). By 
invertibility of the Fresnel-propagator D, uniqueness thus holds if it is possible to 
disentangle the summands D(o), D(o), and |D(o)|*. As shown in [22] using the 
theory of entire functions, the latter is indeed possible whenever o is known to have 
compact support, which is true for any sample of finite size. The principal result 
reads as follows: 


Theorem 14.1 (Uniqueness of XPCI [22]) Let o (= exp(-h) — 1) be a compactly 
supported function (or distribution). 

Then o is uniquely determined by XPCI-data I = |DU + o)|°. Furthermore, 
uniqueness is retained if only restricted data I|x is available, measured for any 
detection-domain K C R” that contains an open set. 

For any such K and 2 C R” bounded, Ne :ht> N (h)|x is injective up to 
phase-wrapping: if X (h\)|x = X (h>)|x for hy, hy € L?(Q), then 


h(x) —ho(x) € 2riZ foralmostall x € R”. (14.23) 


Importantly, Theorem 14.1 establishes uniqueness in the most challenging setting 
of XPCI: single hologram, no homogeneity-constraint. The result trivially extends 
to every less difficult case with more data or additional constraints. However, note 
that the extension of uniqueness to restricted measurements /|x is based on analytic 
continuation of the data—a very unstable procedure in practice. 


3Note that this behavior of D is fundamentally different from that of the Fourier-transform F, which 
maps constants to Dirac-deltas centered at the origin. 
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14.2.5.1 Uniqueness for the Linearized Model 


Uniqueness for the linearized XPCI-model has to be shown individually. According 
to Sect. 14.1.2.2, it corresponds to data of the form Jin = 1 — D(h) — D(h). Com- 
pared to (14.22), merely the quadratic term |D(o) |? is omitted and o = exp(—h) — 1 
is replaced by —h (note that this rules out phase-wrapping!). Hence, the principal 
uniqueness argument from [22] remains valid: the summands D(h) and D(h) may 
be disentangled owing to their different “finger-prints” as entire functions: 


Corollary 14.1 (Uniqueness of linearized XPCI [22]) For any bounded domain 
Q C R” andany K C R” that contains an open set, the linearized forward operator 
Ig : L?(Q) > L?(R"); h> J Ch)|x is injective. 


14.2.5.2 Uniqueness for XPCT 


By combining with standard results on uniqueness of tomographic reconstruction 
described by the theory of the Radon transform, the uniqueness theorems may be 
easily extended to XPCT. We refer to [22] for details. 


14.3 Stability Theory 


The uniqueness results ofthe preceding Sect. 14.2, suprisingly strong though they are, 
do not guarantee that accurate images may actually be reconstructed from holograms 
acquired in real-world XPCI-setups. Experimental data always contains errors due 
to noise and/or inaccuracies of the physical model. As detailed in Chap. 5 such data 
errors may lead to arbitrarily strongly corrupted images due to the phenomenon of 
ill-posedness: even if a forward model F : X — Y is injective, its inverse F7! : 
F(X) > X may be discontinuous such that small perturbations in the data g°” = 
F(f) + € may be arbitrarily amplified in the reconstruction F~'(F(f) + €). 

The aim of this section is thus to supplement the uniqueness results with an 
analysis of stability, exploring how susceptible image reconstruction is to data errors. 
Thereby, it sheds a light on the question which reconstructions are feasible in practice. 
Due to difficulties arising from nonlinearity, the stability analysis is restricted to the 
linearized forward models. 


14.3.1 Lipschitz-Stability and its Meaning 


Although other (weaker) concepts of stability are common in the field of inverse 
problems, the notion of Lipschitz-stability turns out to be most suitable for XPCI: a 
forward map F : X — Y between normed spaces X, Y is said to be Lipschitz-stable 
if a stability estimate of the form 
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IED = Elly 2 Csabll fi = flx forall = fi, f2 € X (14.24) 


holds for some constant Cap > 0. In this case, F has a Lipschitz-continuous inverse 
Fo!: |F! (g1) — F~! e)llx < Ca ligi — golly for all g1, 8&2 € F(X). Notably, 
this implies robustness to data errors: given measurements ge = F(f') +e € F(X), 
the resulting reconstruction-error is bounded by 


Is! -F8 = [FEE -FEFE |x < Cabllelly. (1425) 


The bound (14.25) states that data errors manifest at most amplified by a finite 
factor Cir in the recovered object. Therefore Cstap should be as large as possible: if 
Cstab < 1, the error-amplification predicted by (14.25) may be too large to guarantee 
accurate reconstructions at realistic noise-levels ||e||y. 


Notably, for linear forward models F : X — Y, (14.24) is equivalent to 


nf || F(A) ly > 0. (14.26) 


Ctab = i 
fEX | fllx=l 


Moreover, a linear inverse problem is well-posed if and only if (14.26) holds. 


14.3.2 Stability for General Objects and one Hologram 


Firstly, we consider the most challenging setting of reconstructing arbitrary phase- 
and absorption-images ¢, u from a single (linearized) hologram J ~ 1 + J (h). Sta- 
ble inversion of the forward map Y is commonly argued to be infeasible. Indeed, 
as seen in Sect. 14.2.1, the forward model is not even unique for general images 
$, u € L?(R”), but only if &, u are compactly supported. Accordingly, we assume 
a support contraint in the following: 


h=zu+ioe L?(2) forsome 2 C R” bounded. (14.27) 


14.3.2.1 Analytical Approach 

Our approach to analyzing stability is ultimately based on the principle of holographic 
reconstruction [23], that earned DENNIS GABOR the Nobel Prize in physics in 1971. 
The idea is to rewrite the forward map in the form 


— 7 (h) = 2Re (D(h)) = D(h) + D(h) = D(h) + D (h), (14.28) 


which reveals linearized XPCI data to be a superposition of a propagated image 
D(h) and the back-propagated twin-image h. Applying the Fresnel-propagator D to 
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1—I=D(h)+D-"(h) 


Fig. 14.2 Idea of the stability analysis of 7 [24]: by applying the propagator D to linear XPCI- 
data 7 — 1 © J (h), the twin-image h becomes sharp, see logo in central panel. By restricting to the 
complement of 2 D supp(h), h is eliminated and incomplete Fresnel data D?(h)|ae is obtained 
(right panel). Images show real parts of images computed from a hologram (left panel) acquired at 
GINIX [25, 26], P10-beamline, DESY 


a hologram thus recovers the twin-image h, perturbed by a fringe-pattern originating 
from the doubly propagated image D?(h): 


-D(7(h)) = D’(h) +h, (14.29) 


This is Gabor’s original idea of holographic reconstruction, which is illustrated 
in the first and second panel of Fig. 14.2 for a real-data example. 

For stability analysis, we use the idea in a converse manner. By the constraint h € 
L?(Q), the sharp twin-image (the valuable part ins Gabor’s eyes!) can be eliminated 
from (14.29) by restricting to the complement 2° of 92: 


Re (14.30) 


—D(F(h)) oe = DP’ Wla + hla 

By (14.30), we are left with incomplete (but phased!) Fresnel-data D*(h)| ac. 

Notably, to this point, only stable operations have been applied to the XPCI-data, 

which do not amplify data errors in L?-norm: for any g € L?(R”), it holds that 
ID(g)|aellı: < llgllzz. When applied to (14.30) this bound yields 


IZ Wl = |P(7 M))lae 


p= |P Mle 


(14.31) 


L2 


Finally, by employing the alternate form (14.20) of D, the Fresnel-data on the 
r.h.s. of (14.31) may be identified with incomplete Fourier-data: 


IZ Wl = |P A) a 


p= IFO lel (14.32) 


with h = nn -h and Q; := {x € R” : (2/f) -x € 2}. 
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14.3.2.2 Stability Bound 


Since ||A||,2 = ||h]|;2, the bound (14.32) can be regarded as a relative stability esti- 
mate: recovering an image h € L?(Q) from XPCI-data J (h) is at least as sta- 
ble as the reconstruction of h € L?(2) from Fourier-data outside the domain 25. 
Reconstruction from incomplete Fourier-data in turn is a well-studied problem: an 
uncertainty principle from [27] implies that Lipschitz-stability holds, || F (h)| 2 || = 


er Ih|| for some C ab > 0, provided that 52 is bounded along at least one dimension. 


For rectangular domains 2, the stability-constant C$, may be expressed in terms 
of the principal eigenvalue of a compact selfadjoint operator, for which asymptotics 
are derived in [28]. Via (14.32), these results yield stability estimates for linearized 


XPCI: 
Theorem 14.2 (Stability estimate for general images [24]) Let 2 = [— i; m. Then 


C?” (2, f) := 


stab 


inf IZW > O (14.33) 
heL? (2), |hl|=1 


i.e. the reconstruction of images h with support in Q from linearized XPCI-data is 
well-posed. For f —> ©, the stability constant satisfies the bound 


CHC, f) = m3 (nf) (1 - m + o(r)) exp (—f/8) . (14.34) 


While Theorem 14.2 only gives a worst-case bound on the data-contrast 
lZ (A) || /||4|| over all images h, the result may be sharpened considerably, as detailed 
in [24]: for any h € L? (2), an individual lower bound for ||. (h) || may be given 
based on the eigenvalues from [28] and the images that minimize || 7 (h)||/||h|| may 
be characterized in terms of the associated eigenmodes. 


14.3.2.3 Stability in a Practical Sense? 


Numerical computations in [24] indicate that the bound (14.34) is quite sharp. While 
this is good news for a (pure) mathematician, it is bad news from an applied perspec- 
tive: the predicted (quasi) exponential decay CK} (92, f) ~ exp(f/8) implies that the 
constant quickly becomes very small for larger values of f, e.g. CHp(2, f) s 107, 
for f 2 100. Notably, f = kb? /d is the modified Fresnel-number associated with the 
width of the support-domain , i.e. with the diameter of the imaged sample.* In typi- 
cal XPCI-experiments at synchrotrons, one has 10? < f < 10°, so that Theorem 14.2 
only guarantees stability in practice for imaging settings at the lower end of typical 
Fresnel-numbers. 


“The lateral lengthscale b associated with f is implicitly fixed to the width of 2 by assuming the 
latter to be 1 in Theorem 14.2, as will also be done in all subsequent results. 
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Notably, this is in line with empirics: after all, independent reconstruction of 
phase- and absorption-image & and u from a single hologram, as analyzed here, is 
widely considered as infeasible by practioners. It is thus highly surprising in the first 
place that the problem is technically well-posed at all. 


14.3.2.4 Extension to Other Domains 


Theorem 14.2 seemingly only applies to a very particular choice of the domain 
Q C R”. Yet, it may be readily generalized via the following properties: 


e Translation- and rotation-invariance: As the map is invariant under shifts and/or 
rotations of the coordinates, it holds that C re, f) = Cam (2, f) whenever 2 is 
a shifted and/or rotated version of 2 CR”. 

e Monotonicity: CH (21, f) > Ch (22, f) for any 2) C 2. C R”. 


e Scaling: Contr - 2, f) = CECO, rf) for any 2 C R” andr > 0. 


stab 


Analogous properties hold for the stability constants in Sect. 14.3.3. 


14.3.3 Homogeneous Objects and Multiple Holograms 


In most practical works, one aims to stabilize image reconstruction in XPCI by one 
of the following approaches (often both, actually): 


1. Impose a homogeneity-constraint, e.g. assuming a pure phase object h = i¢ if 
absorption is negligible (u ~ 0), see Sect. 14.1.2.3. 
2. Reconstruct from more than one hologram, see Sect. 14.1.2.4. 


According to Sect. 14.2.1, uniqueness then also holds without support constraints, but 
image reconstruction is still ill-posed in general: the associated forward maps .%, : 
L? R”) > L2(R”) and FIFO + L?(R”) — L?(R”")* do not have a bounded 
inverse due to zeros of the CTFs. 

When both homogeneity- and support constraints can be assumed, well-posedness 
holds true with an improved stability constant compared to (14.34): 


Theorem 14.3 (Stability estimate for homogeneous objects [24]) Let 2 C R” bea 
ball of diameter 1, w.l.o.g. 2 = {x € R” : |x| < 5}. Then 


CEO f, v) := f IAQ Ie 


2 in == 
pEL? (2), |p z2=1 


> max [min fci, af} ‚min [esv, af? |} (14.35) 


for some constants c; > 0 that depend only on m. 
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By Theorem 14.3, the original decay CS), ~ exp(—f/8) of the stability constant 
as f > 00 improves to Ch°™(Q, f, v) ~ f-? with y = 1 for v = 0 and y = 1/2 for 
v > 0. This ensures practical stability also at larger Fresnel-numbers. 

A similar improvement applies for the reconstruction of general objects (no 
homogeneity-constraint) from two holograms: 


Theorem 14.4 (Stability estimate for two holograms [24]) Let FZ) : h > 
(7 (h), 792 (h)) denote the linearized XPCI-model for two holograms at Fresnel- 
numbers fı # f2 (see Sect. 14.1.2.4). Let a be the forward map from Theorem 14.3 
forv =0andĵ = f- := fi! — fz '|7!. Then 


IP]. > 273 


APAL, forall heL? R"). (14.36) 


In particular, for any support-domain 2 C R”, the following stability estimate holds 
true: 
wo . 2 —ı om 
de || > 2 cr, f-,0). (14.37) 
heL?(2) 
IAl,2=1 


Note that the r.h.s. of the stability bound (14.37) increases with the difference fZ! 
between the reciprocal Fresnel-numbers fi Ar Improved stability is thus guar- 
anteed only if fı and fz differ strongly, i.e. if the two holograms are acquired in 
significantly different experimental setups. 


14.3.3.1 Order-Optimality 


For v = 0, it can be shown that the ~ f~! order of the decay in Theorem 14.3 cannot 

be improved: for a fixed bounded domain 2 C R” with non-empty interior, there 
exists a constant Cmax(52) > 0 such that 

Chan (2, f,0) < Cmax OD) (14.38) 

This is a consequence of the bound ||.%(y)||z2 < 1/fllAyvllzz where A is the 


Laplacian, which in turn follows from |so(€)| < E /(2f) for all € € R”. 
Notably, better rates do not even hold in a setting with multiple holograms: for 


any Fresnel-numbers f1, .. . , fe, it holds by a similar argument that 
t 3 
inf | < wta = u (14.39) 
eL2(@),|hpll2=1 17 Al = di 


The reason for this surprising negative result on the benefit of multiple holograms 
is that the CTFs sh) = sin(€”/(2f;)) all share a second order zero at € = 0. This 
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corresponds to the well-known low-frequency instability of XPCI that gives rise to 
the proven f~!-rates of the stability constant. 


14.3.3.2 Numerical Stability Computations 


Other than for the setting in Theorem 14.2, the prediction (14.35) for the stability 
constant Gom (and thus for C19) is far from sharp if the analytical bounds on the 
constants c; from [24] are inserted. Sharp values of C hori may however be computed 
numerically by approximating the minimum singular value of the operator .%, via 


techniques presented in [29, Sect. 3.4]. 


14.3.4 Extensions 


14.3.4.1 Phase Contrast Tomography 


Although the physical setting of XPCI corresponds to m = 2 dimensions, the sta- 
bility results in Theorems 14.2 to 14.4 have been formulated for arbitrary m. As a 
benefit, stability may be readily extended to XPCT: for the considered linearized 
forward models, XPCT data is of the form Ig — 1 = T (Po(f)) for T € {7,4} 
and incident directions 0 € © C S?, compare Sect. 14.1.3. As noted in [30, 31], the 
order of the projector 4g and T may be interchanged: 


Ig — 1 =T (Po(f)) = Po (T°? (f)), (14.40) 


where TS e {7°9, AG} is the equivalent of T in m = 3 dimensions. 

As detailed in [29, Sect. 3.3], the relation (14.40) allows to express stability of 
linearized XPCT via known results for tomographic reconstruction, combined with 
stability bounds for 7°”, ZGD ; L?(2) —> L?(R?) where 2 C R°. Stability then 
depends on a three-dimensional support constraint supp(3 + id) C 2 C R? for the 
imaged sample’s refractive index. 


14.3.4.2 Imaging with Finite Detectors 


There are a number of idealizing assumptions underlying to the obtained stability 
estimates: in addition to the neglected nonlinearity and idealizations in the basic 
physical model such as full coherence, it has also been assumed that the hologram 
I is measured in the whole detector-plane in Fig. 14.1. Due to the finite size of real- 
world detectors (and—more fundamentally—the finite width of illuminating X-ray 
beams), however, only restrictions [|x to some bounded domain K are available in 
practice. 
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According to Theorem 14.1, such restricted data has no impact on uniqueness 
(if K contains an open set). The situation is quite different in terms of stability, as 
analyzed in [32]: for any bounded K C R”"— however large—the inverse problem 
of XPCI becomes severely ill-posed, i.e. Lipschitz-stability is lost so that data errors 
may severely corrupt the reconstructed images. Yet, it is also proven in [32] that the 
situation may be repaired by restricting to images h = u + id of finite resolution 
(smoothness constraint in the sense of Sect. 14.1.1.1): by imposing that the h are 
B-splines on a Cartesian grid of sufficiently large spacing r (2, K, f) > 0 (Le. pixe- 
lated images in some sense), Lipschitz-stability can be restored in the finite-detector 
setting. Physically, the necessity of such a restriction corresponds to a resolution 
limit that arises due to the finite numerical aperture associated with the detector size. 


14.4 Regularized Newton Methods for XPCI 


The following section considers regularized Newton-type methods for image recon- 
struction in XPCI. The proposed algorithm is motivated by the theoretical insights 
gained from Sects. 14.2 and 14.3. 


14.4.1 Motivation 


14.4.1.1 Significance of Constraints 


The stability results of Sect. 14.3 heavily rely on support constraints—without such, 
XPCl is ill-posed or even non-unique. To guarantee stability in practice, image recon- 
struction methods must thus be able to exploit support-knowledge. Also other types 
of a priori knowledge (see Sect. 14.1.1.1) are known to be beneficial. In particular, 
imposing non-negativity often has a similar stabilizing effect as support constraints. 


14.4.1.2 Necessity of Iterative Methods 


By far the most commonly used reconstruction method for XPCI at synchrotrons is 
direct CTF-inversion, as presented in Sect. 2.3. Within the notation of this chapter, 
the approach corresponds to quadratic Tikhonov regularization applied to the lin- 
earized forward maps Z #19 or ZF 41-50, Owing to the linearity and translation- 
invariance of these maps, the reconstruction may be implemented via a multiplication 
in Fourier-space (deconvolution), which renders the approach computationally fast. 
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However, direct CTF-inversion is incompatible with the above constraints: 


e Support constraints supp(h) C 2 for 2 Ç R” break translation-invariance 
e Non-negativity is a nonlinear constraint: any reconstruction imposing it depends 
nonlinearly on the data J — 1—even for linear forward models! 


In either case, reconstruction may thus no longer be achieved by deconvolution. Thus, 
iterative algorithms have to be applied to impose support- and/or nonnegativity- 
constraints in lack of efficient direct reconstruction formulas. 


14.4.1.3 XPCI Beyond Linear Models 


Although the linear CTF-model of XPCI has a surprisingly large regime-of-validity, 
there are settings where linear image reconstruction induces severe artifacts arising 
from the neglected nonlinearity, as demonstrated in Sect. 13.3. Reconstruction algo- 
rithms based on the full nonlinear XPCI-model are thus preferable in principle. The 
main obstacle in using such is that direct inversion formulas for the nonlinear model 
are not known. However, when iterative methods are needed anyway (Sect. 14.4.1.2), 
nonlinear forward maps cause little additional difficulty. 


14.4.2 Reconstruction Method 


In the following, we propose a reconstruction algorithm that meets the requirements 
discussed in Sect. 14.4.1. Details can be found in [33]. 

By choosing F : X > L?(R”) € {M , M} with X = L?(Q, (R)) for 2 c R”, 
optional homogeneity- and/or support constraints are incorporated in the forward 
operator F. Consequently, such constraints are imposed automatically if image 
reconstruction in XPCI is performed by inverting F via any generic regularization 
method for inverse problems, see Chap. 5. In order to exploit Fréchet-differentiability 
of N (see Sect. 14.1.2.1) and the comparably moderate nonlinearity of XPCI, we 
choose regularized Newton methods as introduced in Chap. 5: 


. / obs 2 
hr+ı € argmin „ex || Fax) + F'[hg](h — hy) - (IS — 1) ee 
+ arli — holl + Rzolh, hr). (14.41) 


fork = 0,1,...,Kstop, with initial guess ho € X (usually ho = 0), observed (noisy) 
hologram(s) 7°’ and regularization parameters a, > 0. 

Note that we use a standard squared L?-norm as a data-fidelity term in (14.41), in 
lack of an accurate model for the data error statistics in flat-field corrected holograms. 
The squared Sobolev-term ax||h — holl?;. ‘(kale = |d + e’)s/2 -F N) 
imposes tunable (by the choice of s > 0) smoothness of the iterates h, and acts 
as a regularizer. Finally, R>o(h, hg) is a quadratic penalty term that is designed to 
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correct negative values of Re(h;) or Im(h;) in the subsequent iterate hg+1, see [33] 
for details. 

In the numerical algorithm, a discretized analogue of the quadratic minimization 
problem in (14.41) is solved for images h, € CN, data I°°° € R™ and forward map 
Fas: CN > R“, viaa conjugate-gradient method. The a; and kstop are chosen in a 
widely automated fashion, as detailed in [33]. 


14.4.3 Reconstruction Example 


We assess the capabilities of the proposed method by reconstructing phase ¢ and 
absorption u as independent parameters from a single simulated noisy hologram, 
which is shown in Fig. 14.3a. The considered test case is detailed in [33], where also 
a real-data example is considered for an analogous setting. 

The true phase-image & (Fig. 14.3b) is given by a bulk disk of magnitude 0.2, 
whereas the true absorption-image 0 < u < 0.02 shows a logo-structure (Fig. 14.3c). 
Accordingly, no homogeneity-constraint is applicable so that the test-case is situ- 
ated in the most challenging, unstable setting of XPCI, which has been analyzed 
in Sect. 14.3.2. In particular, recall that image reconstruction is non-unique without 
exploiting further constraints. 

The data is reconstructed using the regularized Newton method from Sect. 14.4.2, 
imposing non-negativity of ¢ and u as well as support constraint, allowing nonzero 
values of &, j only within the circular region marked by the blue dashed line in 
Fig. 14.3b, c. The reconstructed images in Fig. 14.3d, e show that the proposed method 
correctly attributes the disk-structure to the phase-image & and the logo-pattern to 
H, without visible signs of “mixing things up”. The overall lower reconstruction- 
quality in u compared to ¢ is due to the lower signal-to-noise in this parameter, as a 
realistically low absorption-refraction-ratio 3/5 < 0.1 has been assumed in the test 
case. 

Now why does reconstruction of both ¢ and u from a single hologram work here, 
contrary to the usual experience? The diameter of the circular support corresponds 
to a relatively low (modified) Fresnel-number f ~ 87. According to the analysis in 
Sect. 14.3.2, this ensures stability of image reconstruction, as is discussed to greater 
detail in [33] and [24, Sect. 6]. By its ability to impose support constraints (and 
non-negativity), the proposed Newton-type method allows to exploit this theoretical 
stability in practice. 


14.5 Regularized Newton-Kaczmarz-SART for XPCT 


In the final section, we present a Newton-type reconstruction method for X-ray 
phase contrast tomography (XPCT) that is a compromise between flexibility w.r.t. 
a priori constraints and computational performance. We note that the method is an 
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(a) 


hologram-intensity I 


0.25 


2 
phase-image & 


absorption-image u 


— -0.005 


Fig. 14.3 Reconstruction of a general image h = u + i® from a single simulated hologram by 
a regularized Newton method (test case from [33]). a Hologram of size 1920 x 1080. b, c True 
images ġ and u (zooms to the relevant region, that is marked by a red-dashed line in (a)). d, e 
Reconstructed images ¢ and ju, obtained by imposing non-negativity and support of &, 1 within the 
circular region bounded by the blue-dashed line in (b), (c) 


all-at-once approach, as also proposed in [30, 31, 34]: the 3D-object parameters 
ô, 8 are recovered directly from the full tomographic hologram-series, instead of 
first reconstructing 2D-images ¢, u for each hologram individually. Thereby, tomo- 
graphic consistency is imposed as an additional constraint in image reconstruction, 
compare Sect. 14.1.1.1. 

By replacing F € {.Y , %} with the corresponding tomographic forward opera- 
tor from Sect. 14.1.3, Fecr : X > L?(R*)'; ft (F( Po, (fj with X = L?(2, (R)), 
2 CR’, the regularized Newton method from Sect. 14.4.2 may be readily adapted 
to solve the inverse problem of XPCT: 
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Feer(f) ~ (Igy — Dii for f=kß+ikôex. (14.42) 


This is done in [2]. Yet, typical problem-sizes in XPCT with ~ 10° dimensions of the 
discretized object- and data-space, are too large for this approach to be competitive 
in terms of computation times and memory requirements. 

As a remedy, we supplement the approach with a Kaczmarz-type strategy that 
exploits the block-structure of the XPCT-problem (14.42). The idea is to cycli- 
cally perform regularized Newton-steps w.r.t. the small sub-problems Ip 1% 


F (Po,(f)) defined by the measured holograms 7, an under the different tomographic 
incident directions 6 ;: l 


feri € argmin ex || F(Po, FO) + FIPe, CNP,- fd - U- 1)? 
+a(A -VIF Feld + 7IIVCF — folz) (14.43) 


fork = 0,1,..., fMstop — 1 with stop € N. The parameters a > 0,0 < y < 1 control 
the regularization and smoothing w.r.t. the preceding iterate fr. 

Iterations of the form (14.43) are known as regularized Newton-Kaczmarz [35]. 
The advantage compared to bulk (i.e. non-Kaczmarz-)methods is that the operator- 
blocks f > F(9,(f)) require much less computations to evaluate than the total 
XPCT operator Fpcr, which permits efficient computation of the iterates (14.41). 
Moreover, Kaczmarz-type methods often exhibit fast initial convergence, typically 
reaching a good reconstruction already after one or two cycles over the data, i.e. 
for Asop € {1,2}. To promote convergence, the processing order {jı, j2,...} C 
{1,..., t} of the data-blocks should be chosen such that subsequently fitted directions 
0;,,9;,,, differ as strongly as possible, which we achieve by following a “multi-level- 
scheme” from [36]. 


14.5.1 Efficient Computation by Generalized SART 


Although the processed data-size is reduced by the Kaczmarz-strategy, the iterates 
(14.41) still involve a minimization problem on a high-dimensional space of 3D- 
objects f. Moreover, if the minimization is performed iteratively, each iteration 
requires evaluations of the (discretized) projector Po, and its adjoint Po, the 
back-projector, both of which typically amount to much higher computational costs 
than evaluating the XPCI forward map F.’ 

Both computational issues can be resolved by computing the iterates (14.41) via 
a generalized SART® (GenSART-) scheme, as introduced in [38] for a much more 
general class of tomographic Kaczmarz-iterations: 


>For images of size N x N, the discretized forward maps F = .Wf'--~fo may be evaluated in 
O(N? log N) operations, while (back-)projecting 3D-arrays of size N x N x N is O(N?). 
6S ART” refers to the simultaneous algebraic reconstruction technique from [37]. 
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GenSART for Newton-Kaczmarz-iterations: 


1. Forward-projection: py := Fo, (fx) 
2. Optimization in projection-space (u; = Po,(12)): 


Apk €argmin perz, myl F (Pe) + Fl; p) — 0 - DR 


+a((l— plu + ple tlle? -Vplz) (14.44) 


3. Back-projection update: fk+ı = fk + Fi, (Apk) 


The main benefit of the approach is that the required minimization is cast to 
projection-space, i.e. no longer needs to be solved on a high-dimensional space of 
3D-objects but merely on 2D-images. Moreover, the whole scheme requires only a 
single evaluation of Po, (1.) and its adjoint Po, (3.), whereas the optimization (2.) 
does not involve any of these costly operations anymore. 

As is standard for Kaczmarz-type methods, non-negativity of the iterates fk-+ı (in 
real- and imaginary part) may be imposed by adding a final step to the GenSART- 
scheme: fi4129 = max{0, Re( fiy1)} + i max{0, Im(fi41)}- 


14.5.2 Parallelization and Large-Scale Implementation 


Regularized Newton-Kaczmarz, computed via GenSART-schemes, is well-suited 
for large-scale computations and can be efficiently implemented in a parallelized 
manner. While we refer to [29, Sect. 6.3] for a detailed discussion, we mention the 
most important aspects here: 


e Low memory requirements: if the back-projection update (3.) (as well as the 
optional non-negativity projection) is implemented as an in-place operation, only 
a single 3D-array (storing fo, fi, (fi.>0.) f2,...) needs to be kept in memory 
throughout the whole Newton-Kaczmarz-reconstruction. 

Parallelized optimization: as the optimization-step (2.) works on 2D-images only, 
its memory-requirements are low enough to be performed on a single graphical 
processing unit (GPU) even for large-scale data. This permits efficient parallized 
implementation of this step. 

Parallelized 3D-computations: The only operations on the 3D-objects fx are 
forward- and back-projections 99, , Po, and pointwise arithmetics. All of these 
can be easily parallelized at low communication requirements between the dif- 
ferent processors. In fact, it is possible to implement GenSART-schemes in a 
distributed manner: the object-iterates fą may be split into chunks, that are stored 
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and managed by dedicated machines throughout the whole reconstruction. This 
property allows to run Newton-Kaczmarz reconstructions efficiently on multiple 
GPUs. 


14.5.3 Reconstruction Example 


We assess the Newton-Kaczmarz method for XPCT-data of freeze-dried Deinococ- 
cus radiodurans bacteria. The experimental data set, acquired with the GINIX setup 
from Chap.3, is composed of 641 holograms of size 2048 x 2048 at tomographic 
incident angles 6 = 0°, 0.25°,..., 119°, 139°, 139.25°, ..., 180° (one hologram 
per angle). 2D orthoslices of the 3D tomographic data (two spatial and one angu- 
lar dimension) are shown in Fig. 14.4a-c, emphasizing the missing data between 
0 = 119° and 0 = 139°. 

The biological sample constitutes a pure phase object to good approximation, i.e. 
vanishing absorption 8 = 0 may be assumed. Moreover, the sample is localized in 
a small subdomain of the imaged 2048 x 2048-sized field-of-view, as can be seen 
from Fig. 14.4a—c, i.e. support constraints may be imposed. 

For comparison, we reconstruct the XPCT-data with different methods: 


1. CTF+FBP: direct CTF-inversion for each hologram, followed by filtered back- 
projection applied to the recovered projections of 6. 

2. Linear Kaczmarz: reconstruction by (14.43) over a single cycle ngtop = 1, using 
the linearized XPCI-model F = .%. Non-negativity of the reconstructed 6 and 
support in a centered cube of 512? voxels is imposed. 

3. Newton-Kaczmarz: same as (2.), but with the nonlinear model F = -%. 


2D orthoslices through the reconstructed 512 x 512 x 512 volumes are plotted 
in Fig. 14.4d-l. We note the following observations: 


e The additional constraints exploited in “Linear Kaczmarz” compared to 
“CTF+FBP” widely eliminate low-frequency background-artifacts (compare 
Fig. 14.4e-h) and thereby enable quantitatively correct reconstructions ô. 

e Though the sample-induced phase shifts are moderate, dg = k F9(6) < 1, going 
over to the nonlinear XPCI-model has significant effects: especially in Fig. 14.4h, it 
can be seen that using the linearized model causes artificial distortions in the recov- 
ered object-density compared to the nonlinear Newton-Kaczmarz-reconstruction 
in Fig. 14.4i-1. 

Accordingly, both the nonlinearity and the ability to exploit a priori constraints 
of the proposed Newton-Kaczmarz method turn out to be vital here to accurately 
reconstruct the anticipated 3D structure of the imaged bacteria’: cytoplasm with 
blob-shaped inclusions containing the DNA, where each of the two compounds is of 
approximately uniform density. 


7The additional object in the top-left of Fig. 14.4e, h, k is a contaminant particle. 
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Fig. 14.4 XPCT-reconstruction of Deinococcus radiodurans bacteria with different algorithms. 
Rows show 2D orthoslices for: a-e the stack of 641 holograms of 2048 x 2048 pixels each (x, y: 
detector-coordinates, 6: tomographic incident angle) d-I reconstructed object-volumes with dif- 
ferent methods (d-f CTF-inversion followed by FBP-reconstruction, g-i Linear Kaczmarz, j-l 
Newton-Kaczmarz). The tomographic axis is the y-axis. Scale bars: 1 um. For details, see text 
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Chapter 15 A) 
Scanning Small-Angle X-ray Scattering geit 
and Coherent X-ray Imaging of Cells 


Tim Salditt and Sarah Köster 


Abstract In this chapter we review recent work towards high resolution imaging of 
unstained biological cells in the hydrated and living state, using synchrotron radiation 
(SR) and free electron laser (FEL) radiation. Specifically, we discuss the approaches 
of scanning small-angle X-ray scattering (scanning SAXS) and coherent diffractive 
X-ray imaging (CDI) of cells. 


15.1 X-ray Structure Analysis of Biological Cells: A Brief 
Overview 


The desire to probe the three-dimensional (3D) structure of biological cells and tis- 
sues athigh resolution and under hydrated conditions has motivated a continuous and 
long-lasting effort to develop suitable high resolution microscopy techniques. Fluo- 
rescence light microscopy provides an excellent tool to label specific biomolecules 
and organelles. As the last three decades have shown, an ever increasing number 
of imaging problems can be addressed by this specific labeling approach. However, 
not only the strength but also the limitation of this microscopy technique is linked 
to the selective imaging of a few components within the cells. Firstly, fluorescence 
microscopy of living cells can typically not be applied when transfection with fluo- 
rescent proteins is not possible or too invasive. Secondly, some questions in biology 
and biophysics cannot be answered from mapping selected macromolecular compo- 
nents, but necessitate the visualization of the entire mass density distribution in the 
cell. In these cases high resolution images with quantitative mass or electron density 
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contrast are needed, rather than the distribution of a selected label. Hard X-rays with 
multi-keV photon energies can contribute exactly this contrast mechanism related to 
the native electron density distributions in biological matter. 

Apart from this aspect, other specific advantages of X-rays are: (i) a scalable 
resolution down to the X-ray wavelength of Ä to nm, (ii) a kinematic nature of 
the scattering process enabling quantitative image analysis unaffected by multiple 
scattering, (iii) element specific contrast variation exploiting anomalous effects at 
absorption edges, (iv) compatibility with unsliced (three-dimensionally extended), 
unstained and hydrated specimens due to the large penetration depth. In this chapter 
we review recent studies of biological cells with hard X-rays. We focus on proof- 
of-concept experiments with micro- and nano-focused X-ray beams which have 
extended classical small-angle X-ray scattering (SAXS) to cellular imaging, com- 
bining real and reciprocal space. Classical SAXS is known as a structural technique 
for soft and biological matter, biomaterials and proteins which does not offer any real 
space information and can hardly be used on systems which are as heterogeneous 
as a biological cell. We also include coherent diffractive X-ray imaging (CDI) tech- 
niques as an X-ray imaging modality, which can complement scanning SAXS. We 
do no include, however, X-ray fluorescence microscopy of cells, which is by now 
quite well established, see [1]. 

The development of X-ray microscopy and imaging techniques has always been 
closely related to the availability of high brilliance radiation, provided by synchrotron 
radiation sources, and recently also by X-ray free electron lasers (FEL). Ultra-short 
and high brilliance FEL pulses may offer sharp still images of structure even of 
living cells, since the signal is recorded before structural changes occur by radiation 
damage. However, these very recent opportunities should not lead us to believe that 
imaging of cells with X-rays is an entirely new research topic. It has, in fact, already 
started with the pioneering work in the eighties both by the Göttingen group of 
G. Schmahl [2] and the Brookhaven group led by Kirz [3]. Biological microscopy 
with Fresnel zone plates in the so-called water window spectral range is by now 
a mature technique [4, 5]. In this chapter we restrict ourselves to the more recent 
developments of hard X-ray microscopy (i.e. photon energies above 5 keV). 

In scanning SAXS, resolution in real space and reciprocal space is combined 
in a hybrid manner. This differs from other approaches which either reach high 
resolution in reciprocal space—based on diffraction averaging over a large ensemble 
such as in SAXS—or in real space based on inverting the diffraction pattern, e. g. by 
CDI. In fact, scanning SAXS with nano- or microbeams combines high resolution 
in reciprocal space (by analysis of the diffraction patterns and accounting for the 
available g-range) with resolution in real space on the order of the beam size. The 
method can hence probe local structures (in reciprocal space) in a range smaller 
than the beamsize down to the length scale given by the signal-to-noise cut-off. This 
cut-off depends on the degree of order in the sample and is typically intermediate 
between length scales of the organelle and the molecular constituents. At the same 
time, the resolution in real space is limited by the focal spot size. Scanning SAXS has 
been first demonstrated on biomaterial specimen such as wood and bone [6, 7], and 
then also for various tissue samples [8], with typical real space resolution values in 
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the range of several microns. More recently, nano-focusing techniques (see Chap. 3) 
have made it possible to reach spot sizes well below 100nm, based on reflective 
(mirrors, waveguides), refractive (compound refractive lenses) or diffractive optics 
(Fresnel zone plates, mutlilayer zone plates). Nano-focusing with X-rays is reviewed 
in [9], and also treated in advanced textbooks [10], while biological materials and 
cells imaged by diffraction and scattering have been previously reviewed in [11]. 

Direct imaging in real space, with a resolution below the X-ray spot size, is 
enabled by (far-field) CDI or (near-field) holographic techniques, see Chap. 2. For 
extended and non-compact objects such as cells, ptychographic CDI, or multi-plane 
holographic recordings are well suited to solve the phase problem, since the support 
constraint cannot be used. As in scanning SAXS, contrast is based on the native 
electron density distribution in hydrated biological cells. However, in contrast to 
scanning SAXS the specimen is directly imaged in real space. This is possible without 
any labeling, fixation, or staining. Within certain dose restrictions and for a short 
time span, CDI is also amenable to living cells [12]. The dose values to observe 
a time series on the same cell, however, are prohibitively high. For static images, 
a resolution below 100nm is possible at synchrotron sources, while a range below 
10 nm may be reachable by single ultra-short X-ray pulse using FEL radiation. In 
fact, the first ptychographic imaging of a cell already achieved 85 nm resolution on 
low contrast (unstained) bacterial cells at a fluence of 10’ photons/um? [13, 14]. 
Extrapolating from these results and assuming a J œ q~* power law decay for the 
scattering intensity FEL pulses, delivering 10! photons/um? in a time span below 
50 fs, results in a resolution better than 3 nm. Actual experiments, however, are still 
about a factor of ten above this estimate, see for example [15], who have reported 
2D reconstructions of projected electron density with 37 nm (half period) resolution 
for living bacterial cells. 

As we will review here, recent work has now brought scanning SAXS and CDI 
of biological cells to the level where they can complement optical fluorescence and 
electron microscopy. In particular, they can ‘shed X-ray light’ on unlabeled cellular 
structures in cells by providing an electron density based contrast. Biomolecular 
assemblies can hence be investigated without slicing and staining, in fixed cells and— 
with restrictions—also in living cells [16-18]. This new “contrast mechanism” can 
possibly be useful for a very diverse range of problems. Here we name just a few 
examples, which may simply be closer to our perspective than others: 


e protein network architecture and the impact on cellular mechanics 
e protein filament bundling by cross-linking 

e force generation in cellular locomotion and muscle contraction 

e DNA compaction in the nucleus 

e amyloid aggregation. 


After the following section, which addresses requirements of beam preparation 
and sample environments, we will first review scanning SAXS, followed by a section 
on direct imaging by ptychography and holography. We then address the topic of 
cellular imaging with FEL, and close with a section discussing multi-scale imaging, 
from the cellular to tissue level. 
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15.2 Methods: X-ray Optics and Sample Environment 


15.2.1 Focusing Optics and Imaging Modalities 


Imaging of cells with hard X-rays has been enabled only by the recent progress 
in X-ray optics and focusing, required to concentrate photons on a single cell or to 
specific regions within the cell. Since scattering of biological matter is typically weak, 
due to low-Z elements involved, beam preparation, cleaning of the beam path by 
apertures, efficient detection, and background subtraction are major issues. Finally, 
in situ optical microscopy is required to select scanning regions, perform alignments, 
and to monitor the cell with respect to radiation damage. The work reviewed here 
has been performed at synchrotron beamlines which combine these functionalities, 
notably ID13 of the European Synchrotron Radiation Facility (ESRF) in Grenoble, 
the cSAXS beamline of the Swiss Light Source, and the Gottingen Instrument for 
Nano-Imaging with X-rays (GINTX), installed at the P10 coherence beamline of the 
PETRA III storage ring at DESY in Hamburg. GINIX has been specifically designed 
for imaging of cells and tissues by holography and scanning SAXS [19]. Furthermore, 
it is fully compatible with tomography and also correlative optical microscopy [20]. 

As an example, Fig. 15.1 illustrates the different beam configurations and imaging 
modalities offered by the modular compound nano-focus optical system of GINIX. 
Similar modalities have also been realized at other beamlines. The optical system 
is composed of a high gain fixed curvature Kirkpatrick-Baez (KB) mirror and a 
probe filtering module, based on cleaning apertures and/or X-ray waveguides. Three 
different imaging modalities are sketched: 


(a) Scanning SAXS, or more generally nano-diffraction in the small angle or wide- 
angle regime depending on detector position. For cells without mineralized or crys- 
tallized components, only SAXS signals are observed. Diffraction data are recorded 
for each scan point, forming a tensor product with two reciprocal space dimensions 
and two real space dimensions. As in conventional diffraction, a beamstop is required, 
sampling in real and reciprocal space is not very constrained, and coherence can be 
very low. The analysis is largely based on models and fitting of diffraction patterns 
in reciprocal space. 


(b) Ptychography, i.e. far-field CDI with ptychographic phase retrieval. The slits in 
front of the KB are closed to achieve full coherence, and the anti-scatter apertures 
of (a) are replaced by pinholes to compactify the probe, i.e. to absorb the tails of the 
KB in focal space [21]. The sample is then scanned laterally behind the pinhole with 
partial overlap between exposures. Oversampling in the detector plane is required, 
and the beamstop must be sufficiently small or semi-transparent [22, 23] to recover 
low spatial frequencies. 


(c) Holography, i.e. near-field phase contrast imaging in a diverging spherical wave, 
emitted from the exit of an X-ray waveguide. Due to the smaller confinement in the 
waveguide’s guiding channel, the divergence of the exit beam increases with respect 
to the KB beam, resulting in higher numerical aperture. The waveguide also results 
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Fig. 15.1 Schematic of different imaging modalities for biological cells, for the example of the 
GINIX endstation of the P10 beamline (PETRAIII storage ring, DESY). a For scanning SAXS 
the beam is focused by Kikpatrick-Baez (KB) mirrors and cleaned by two successive soft-edge 
apertures (A1,A2) to cut the KB tails, and the diffraction is recorded by SAXS (D1) and WAXS 
(D2) detectors, each with respective beam stops (BS). b For ptychography, the KB-beam is made 
fully coherent by closing the entrance slits (SL), and the probe can be compactified by pinholes (P) 
if necessary for sampling. e For holographic imaging, the sample is moved to a defocus position, 
and after alignment with the pixel detector (D1), a high resolution detector (D4) is used to record the 
hologram. To increase the numerical aperture and hence the resolution, and to filter the wavefront, 
an X-ray waveguide (WG) is place into the focal plane of the KB. In this way, artifacts related to 
the typical wavefront distortions of a KB beam can be avoided. Adapted from [19] 


in wavefield and coherence filtering [24]. The holographic pattern is recorded by a 
high resolution detector, and is treated by phase retrieval in the optical near-field, as 
discussed in Chaps. 2 and 13. One advantage of (c) over (b) and (a) is that it is a full 
field technique and that images can be recorded without scanning. Both modalities 
(b, c) yield the projected electron density of the object. 


15.2.2 X-ray Compatible Microfluidic Sample Environments 
for Cells 


X-ray experiments on fixed hydrated and on living cells require a suitable sample 
environment, which closely mimics physiological conditions and does not further 
deteriorate the already low signal-to-noise ratios. For living cells, it is indispensable 
to provide nutrients and control metabolites by a continuous exchange media or 
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buffer. Depending on the experiment, one may choose between cultivation of cells 
directly in the X-ray chamber or a suitable transfer strategy. Finally, a high throughput 
of suspended cells or the ability to scan a large number of adhering cells is important to 
obtain statistically relevant data sets. Another challenge to be faced when studying 
biological matter, and in particular living cells, is radiation damage. The dose is 


defined as ETR 
p= EU (15.1) 
po 


where Jo is the primary beam intensity in photons per time, T the exposure time, p 
the mass density of the sample, and o the exposed area per scan point. Fast scanning, 
for example, helps to decrease the dose and protect the samples from deterioration. 

To this end, the advent of microfluidic devices fabricated by soft lithography 
has been an enabling event for this research field. Even before X-ray experiments 
on cells had become a topic of interest, solution SAXS and structure analysis of 
suspended biomolecular assemblies had already been augmented by the possibility 
to observe in situ structural dynamics by making use of hydrodynamic focusing in 
microfluidic devices, as reviewed in [27, 28]. This approach paved the way to then 
adapt the fabrication processes and to develop X-ray compatible cell culture chambers 
[18, 25]. The most important requirements for flow chambers that are compatible 
with both X-ray studies and cell culture are: radiation stability, low absorption, low 
background scattering, control over the degree of cell adhesion to the materials, 
biological compatibility, and ease of fabrication. A method to custom-build flow 
chambers from UV curable adhesive as channel defining material, Kapton® film as 
radiation resistant window material and optional silicon-rich nitride (SiRN) windows 
as a substrate for cell growth is shown in Fig. 15.2a step-by-step and the resulting flow 
chambers are shown in photographs in Fig. 15.2b [25]. In these devices, the cells are 
constantly supplied with nutrients during the experiments and waste products of the 
cells are flushed away. Another advantage of the constant flow of liquid is that free 
radicals produced due to the radiation are flushed away, the sample is permanently 
cooled and air bubbles are reduced. The device design is very flexible since the 
fabrication is based on photolithography. Therefore, virtually any channel geometry 
can be realized and tested. 

In addition to the home-built flow chambers, one can use also commercially 
available microfluidic cell chambers (e.g. ibidi®, Germany) as routinely used for 
optical microscopy. For X-ray microscopy, modified versions with adapted window 
materials (coated and uncoated SiRN) are preferred, see Fig. 15.2c. Finally, simple 
home-built chambers with windows made of cover slides may also do for fixed 
hydrated cells, if an absorption of X-ray photons by < 200 um glass can be tolerated. 
In principle, a large set of window materials can be used for the X-ray measurements. 
The constraints differ for the imaging modality, since ‘low background’ in SAXS 
versus holography is associated with different material properties. For example, the 
residual phase shifts induced by traversing a few mms of Zeonor-® were found to be 
small enough to still allow for phase contrast imaging of attached cells in the channel 
by holographic full-field imaging [12]. For scanning SAXS using a nanobeam, on the 
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Fig. 15.2 a Step-by-step sketch for fabricating X-ray compatible microfluidic devices for cell 
imaging. The insertion of the SIRN window is optional. b Photographs (back/front) of the flow 
chamber as described in (a). 1: PDMS (polydimethylsiloxane) support; 2: SIRN window; 3: flow 
channel; 4: inlet/outlet. c Flow chambers based on commercial microscopy slides with further 
insertion of a SiRN window. d Simple chambers with coverslip windows. a, b reproduced from 
[25] with permission from The Royal Society of Chemistry, c, d from [26] 


other hand, the excess polymer material resulted in an increased background level, 
which was reduced by inserting SIRN windows. As also compiled in a number of 
recent monographs [29-33], it is now fairly well known which window materials 
are compatible (and at which thickness) with a given imaging modality. Different 
window materials (glass, cyclic olefin copolymers [34], polypropylene) are now 
available for cellular growth. 

An entirely different challenge is to create the sample environment for single pulse 
experiments by FEL. As a new sample has to be delivered to the ‘interaction zone’ 
for a subsequent ‘hit’, tailored microfluidic platforms are required. The strength of 
microfluidics is the high level of controllability. Flows are typically laminar, enabling 
exact control of important experimental parameters such as buffer/media conditions, 
temperature, induction period of reagents to the cells and so on. However, in contrast 
to synchrotron experiments, window-less flow chambers are necessary, since a single 
pulse would already destroy window materials. Free jets or microfluidic channels 
with holes offer of minimum background scatter and full compatibility with the 
X-ray beam propagating in ultra high vacuum. 

Figure 15.3a shows the example of a free microfluidic jet with laminar flow 
beneath a nozzle with 13 um exit, which has been used for diffraction on suspen- 
sions of biomolecules with micro-focus synchrotron radiation [35], but is also fully 
compatible with vacuum injection and therefore well suited for FEL experiments, 
see also [38]. The large speed of the jet, however, results in an elongational shear 
stress, which would certainly harm most eukaryotic cells, whereas it has been shown 
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Fig. 15.3 a Microfluidic jet for sample delivery. Jets offer a windowless access to hydrated 
biomolecular samples, viruses, and bacteria with continuous high-rate replenishment of samples. 
In the diffraction experiment sketched, the jet (3) is aligned in the focal plane of a KB optic (1), 
behind a cleanup aperture (2), and the far-field diffraction is recorded by a 2D detector (5), with 
a miniaturized beamstop (4) blocking the primary beam directly behind the jet. b Finite element 
simulation of the flow field for a jet with diameter of 13 um and a break-up length of a few mms. 
High flow verlocity can be used for orientational alignment of biomelecular assemblies, for example 
of a membrane suspension [35], but elongational and shear strain must be reduced for delivery of 
cells. c Photograph of the nozzle and the jet. d Optical stretcher. The sketch shows the integration 
of the laser system on the microfluidic chip, and the optical axis of the X-ray beam. The capillary is 
half-cut to give an inside view. The inset shows the central x-y-cross-section of the system. When a 
cell enters the trap, a highly anisotropic stress profile on the cell contour results in its trapping and 
stretching [36]. a—c adapted from [35], d from [37] 
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that bacteria can indeed be delivered by these jets, withstanding the hydrodynamic 
stresses [35]. In the meantime, jet technology has advanced tremendously, includ- 
ing electrostatic control, gas confinement and focusing layers [39]. Further, dedi- 
cated aerosole injectors have been designed for FEL sample delivery [40]. However, 
while some of these devices have been demonstrated to be suitable for viruses and 
smaller particles, the sample delivery for cells is still in its infancy. Living cells kept 
in a micro-liquid enclosure array, with each element used only once, as presented 
in [15], are one possible option. Jets with moderate hydrodynamic stress coupled out 
of microfluidic devices may offer more flexibility and control parameters. Finally, 
microfluidic channels with micro-sized holes drilled into their enclosure materials 
may be a third attractive route for further development. 

Manipulation of cells in the beam and probing at controlled application of force is 
an entirely different challenge for the sample environment. In Fig. 15.3d we show the 
example of an optical stretcher, which was recently used to trap macrophages, and 
to rotate them in the beam for a tomographic scan [37]. In the stretcher, two oppos- 
ing and divergent laser beams are used to trap and to stretch cells. The stretcher 
has been developed by Guck et al. [41] as a tool to studying the elastic properties 
of biological cells based on video microscopy of their deformed shape functions, 
since high deforming forces can be applied to biological objects such as cells. In 
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this respect, stretchers offer an added functionality complementary to optical tweez- 
ers, which are commonly used to micro-manipulate micron and sub-micron-sized 
particles. In [37] the experimental capabilities of an optical stretcher as a potential 
sample delivery system for X-ray diffraction and imaging studies was explored. Even 
in anon-optimized configuration based on a commercially available optical stretcher 
system, X-ray holograms could be recorded from different views on a biological cell 
and the three-dimensional phase of the cell could be reconstructed. By means of high 
throughput screening, the optical stretcher could possibly become a useful tool, both 
for SR and FEL studies. 


15.3 Scanning Small-Angle X-ray Scattering of Cells 


In the past, we have focused our efforts to image cellular components mostly on, first, 
the cytoskeleton, due to particularly well ordered structures (see, e.g., Figs. 15.4a and 
15.7a, b). Second, we have investigated the packing and (de-)compaction of DNA in 
the nucleus of eukaryotes (see Fig. 15.6a) and nucleoids of bacteria (Fig. 15.9d). The 
cytoskeleton determines the shape, motility, viscoelastic properties and generated 
forces of eukaryotic cells and is also a fascinating active soft matter system. It consist 
mainly of three distinct filament systems, along with associated binding proteins 
and molecular motors. Actin filaments play an essential role for directed cellular 
motion via polymerization and depolarization in lamellipodia and filopodia, form the 
stabilizing actin cortex and, together with myosin motors, stress fibers as the basic 
building blocks enabling the contractability of muscle cells. Microtubules contribute 
a system of ‘tracks’ within the cell, along which motor proteins transport cargo. They 
are also instrumental in cell division as they pull the chromosomes into the daughter 
cells. Intermediate filaments, such as vimentin or keratin, finally contribute greatly 
to passive mechanical properties of cells and protect the cell from destruction by 
heavy impact. 

The DNA, which contains the genetic information of an organism, is densely 
packed in the nucleus or nucleoid and needs to be unpacked in a highly controlled 
manner for protein production. This apparent contradiction is solved by nature using 
a strictly hierarchical way of packing. To study the architecture and interactions of 
biomolecular assemblies, biophysicists have since long used SAXS. As the model 
systems get more and more complex, however, SAXS data interpretation is often 
impeded, in particular in the presence of large heterogeneity and the lack of molecule 
specific information available. Scanning SAXS, in particular when combined with 
visible light microsopy, addresses this challenge by providing additional real space 
constraints. Since scanning SAXS patterns average only locally over a small volume, 
a complete powder average is no longer obtained. For this reason, the anisotropy of 
the SAXS pattern can be measured, providing further clues. Of course, the indirect 
modeling of diffraction signals could be made obsolete altogether, if direct inversion 
of a coherent diffraction pattern was possible. However, this is to date not possible 
at the resolution which is typically achieved by SAXS. Apart from coherence and 
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Fig. 15.4 Scanning SAXS with a nano-focused beam. a Inverted gray scale fluorescence micro- 
graph of a SW13 cell with keratin K8/K18; the red arrow points to a particularly dense keratin 
bundle structure. b Corresponding X-ray dark field image overview image. c Composite image of 
individual SAXS patterns corresponding to the region in the red box in b. d Detailed X-ray dark 
field image of ROI (red box in b) recorded at smaller step size. e Single 2D diffraction pattern 
which clearly shows the orientation of the keratin bundles by the anisotropy of the signal (left) 
and integrated 1D intensity curve (right). Segments 1 and 5 align with the anisotropy and show 
distinct modulations of the signal. f X-ray darkfield scan of muscle induced human mesenchymal 
stem cell (hMSC), along with g the corresponding analysis of the anisotropy of the diffraction 
pattern revealing for example local orientations of networks of cytoskeletal components. h Single 
diffraction pattern of the region marked in (g). a-e adapted from [17], £-h from [26] 


sampling issues which have to be met, inversion of data at higher momentum transfer 
and at low signal-to-noise level is much more difficult than just fitting a decay. Most 
importantly, model-based interpretation of SAXS data makes sense despite the fact 
that only one 2D projection is available. Contrarily, the information of the 2D pro- 
jected electron density in real space, obtained for a single projection angle becomes 
useless at small scales without 3D tomography (which is often times technically not 
feasible). 
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For this reason, we regard scanning SAXS as an indispensable tool in X-ray 
microscopy of biological cells, even if the first SAXS studies of individual cells 
(disregarding tissues and biomaterials) were only published in 2012 [17, 42]. One 
of the two papers used SAXS only as a complement to a ptychographic experiment 
to circumvent the stringent oversampling conditions, in search for structural clues 
on the condensation of DNA in the bacterial nucleoid [42]. Deinococcus radiodu- 
rans was chosen as a model system for the debated mechanisms of radiation damage 
repair [14, 42, 43]. Cells were prepared by rapid vitrification followed by freeze 
drying, and the SAXS signal was found to obey a power law decay with g~” with 
v ~ 3.2 — 3.7, depending on the scan. The cross-over of the power law decay to the 
noise floor, which can serve as a resolution criterion, was observed atq = 0.188 AT, 
The first eukaryotic cells were studied at the same time by experiments dedicated to 
the development of scanning SAXS on cells. For this purpose, SW 13 epithelial cells 
with a pronounced keratin K8/K18 network were chosen in order to highlight one of 
the three components of the cytoskeleton, see Fig. 15.4a-e. The cells were grown on 
SiRN windows, which are excellent substrates for cell growth and virtually transpar- 
ent for X-rays, and subsequently chemically fixed, plunge-frozen and freeze-dried. 
This ensured a high electron density contrast between the cellular material and the 
surrounding air, which facilitated the recording in the starting phase. Cells were then 
imaged by scanning SAXS with a small beam (140 x 110nm? and 200 x 125 nm”) 
and with small step sizes (50-2000 nm). For illustration, an X-ray dark field image, 
where in each position all scattering is integrated and plotted on a color scale, is 
shown in Fig. 15.4b, a detail in Fig. 15.4d. Example diffraction patterns are shown 
in Fig. 15.4c and at larger magnification in Fig. 15.4e. The signal was then inte- 
grated azimuthally to obtain 1D /(g) curves and, despite the small sample volume 
probed (beam size multiplied by sample thickness), distinct diffraction peaks were 
observed [17]. 

Pronounced streaks in diffraction patterns of Dictyostelium discoideum were 
explained with formation of fiber bundles in the acto-myosin contractile ring, based 
on comparing the ring like occurance of these features in SAXS scan to the typical 
contractile ring observed in fluorescence microscopy [16]. Modulations in the streaks 
were modeled based on a simple fiber scattering model [16, 32]. The strength and 
anisotropy of diffraction patterns were compared between naive human mesenchy- 
mal stem cells and differentiated stem cells, indicating that order of the cytoskeleton 
of stem cells increases during the differentiation process [26]. Modulations in the 
azimuthally integrated SAXS data could even be quantified by rendering the quali- 
tative models into fitting procedures. Thus, the packing geometry, filament diameter 
and center-to-center distance and bundle diameter could be determined for keratin 
bundles in intact cells. The results correspond exactly to those derived from scanning 
electron microscopy, albeit without the need for slicing the cell [44]. 

Yet another highly ordered state of actin, namely hair cell stereocilia from the 
inner ear, were studied in [45]. These cell protrusions act as force sensors and enable 
hearing by transforming mechanical bending into a neuronal signal. They are filled 
with numerous parallel, densely packed actin filaments. Stereocilia were ‘stamped’ 
on coated SiRN windows, chemically fixed, stained for actin, and freeze-dried. Scan- 
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Fig. 15.5 a Darkfield image of a chemically fixed, hydrated cell and b of a living cell. Note that 
in (b) several scan lines were skipped in order to account for the severe effects of radiation damage 
in this case. ce 1D radial intensities plotted against the q values; note that the data for living cells 
(blue) are scaled by a factor of 10 for better visibility. d Power law exponents of chemically fixed 
(black) and living (blue) cells. The latter are consistently larger, indicating nanoscale changes. 
e Difference signal; data for fixed cells subtracted from data for living cells. Thus, values below 
O show structures emerging upon fixation, whereas values above 0 hint at structures that were 
destroyed by the fixation process. Reprinted with permission from [18], Copyright (2014) by the 
American Physical Society 


ning SAXS experiments revealed an extremely high order within the stereocilia, but 
also considerable spatial heterogeneity in the translational and orientational structure. 

After these early studies proved successful regarding interpretation and signal 
level, the next step was to investigate chemically fixed and living cells in hydrated 
state. Interestingly, distinct differences in the power-spectra of living and chemically 
fixed cells (see Fig. 15.5) were observed, pinpointing to both emerging and destroyed 
nanostructures upon chemical fixation [18]. This study illustrated very clearly the 
advantage of X-ray nano-imaging over other imaging methods. So far, X-ray imaging 
is the only way to directly compare fixed (or labeled, stained) and untreated whole 
cells since extensive sample preparation is not necessary for imaging. The results 
become particularly important now that fluorescence microscopy, where chemical 
fixation is a routine method, reaches well into the affected length scales (see top axis 
in Fig. 15.4e). In further studies, scanning SAXS was carried out on cryogenically 
preserved (i.e. vitrified) cells [16], which offers a larger range of dose before damage 
is observed. A direct comparison of the power law exponents for the decaying 1D 
SAXS curves shows that the results from cryo-preserved samples can be reproduced 
at room temperature, however, the primary beam needs to be attentuated and the 
exposure time decreased [46]. 


15 Scanning Small-Angle X-ray Scattering and Coherent X-ray Imaging of Cells 417 


Following the first proof-of-concepts of scanning SAXS for biological cells, the 
optimization of imaging and analysis capabilities became important. To this end, 
several issues of optics, instrumentation and analysis were addressed: (i) choice of 
focus and influence of focal size on the SAXS data quality, (ii) rapid scanning using 
continuous movements and synchronized detector read-out, (iii) improvements of 
alignment procedures and tools based on in situ optical microscopy, (iv) optimized 
pixel detector read out and control software [47], (v) optimized beam preparation and 
suppression of tail scattering, for example, of the KB mirror system, (vi) specially 
fabricated semi-transparent beam stops [22], as well as (vii) the completion of a ver- 
satile toolbox for scanning SAXS which was made publically available [48, 49]. In 
this way, it now has become possible to compute structural observables in a fast and 
automated way, based on empirical data descriptors. Algorithms for semi-automatic 
quantification of the diffraction patterns include analysis of anisotropy parameters by 
automized fitting of ellipsoids [17], decomposition into principal components [26], 
the automized power law fitting [18, 44, 50, 51] and the computation of cumulants 
[50] to describe the azimuthally averaged structure factors for different regions-of- 
interest within the cells. In this way, empirical analysis has become possible also 
for extremely large data sets. At the same time, model based analysis of diffraction 
patterns, based on fiber models with free fitting parameters [16, 44], or sarcomere 
models for muscle cells [49] has been used. The effects of different sample prepara- 
tion methods on the detected structure factor, including the freeze-dried, chemically 
fixed, frozen hydrated and living states was investigated more systematically. Radi- 
ation dose effects were also studied, and dose was precisely quantified by including 
ptychographic reconstruction of the probing beam [42]. In several studies, scanning 
SAXS and CDI was combined on the same cell, including combinations of SAXS 
with holography [51], and SAXS with ptychography [42, 44, 45]. 

It is not possible (yet) to record “movies” of living cells using scanning SAXS. 
Thus, to obtain temporally resolved information, an indirect approach by recording 
“snap shots” of cell has to be taken. An example is shown in Fig. 15.6, where the stage 
in the cell division cycle was determined by visible light bright field microscopy and 
the nuclei of these cells were recorded by scanning SAXS. Thus, it is possible to 
relate the SAXS signal and the exponents derived to the cell cycle time point (Fig. 
15.6b) and derive information of the compaction and decompaction of the DNA. 

The best choice of real space and reciprocal space resolution in scanning SAXS 
deserves some special consideration, and a compromise has to be found for the spe- 
cific application at hand. For example, a tight nano-focus leads to higher divergence 
and often also pronounced beam tails or streaks which can easily compromise the 
low-q signal in SAXS pattern. If this is to be prevented, the beam size must not be 
too small, typically in range of a few microns rather than in the range of a few hun- 
dred nanometers. Figures 15.7 and 15.8 show examples of micro-SAXS for cardiac 
tissue cell, where a clean recording of the sarcomere diffraction pattern at low-q 
required relaxation of real-space resolution to the micron range. Figure 15.7 shows 
that the corresponding real space images of neonatal rat cardiac tissue cells can still 
be recognized and correlated to the fluorescence micrographs, while at the same time 
the quality of the SAXS signal was improved, enabling automated decomposition of 
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Fig. 15.6 a Schematic of the cell and nucleus division cycle of a eukaryotic cell. b Porod expo- 
nents of data taken from cells in different stages of the cell dividing cycle. An increase of the 
exponent towards 3.8 coincides with the decompaction of the DNA as the cell grows. The subse- 
quent decrease is related to the compaction as the DNA is duplicated. Spatial representation of the 
DNA compaction state, ¢ dark field image, d Porod exponent and e Porod constant. Interestingly, 
the Porod exponent is fairly homogeneous throughout nucleus and cytoplasm indicating a similar 
degree of heterogeneity in the structures. The Porod constant, by contrast, highlights dense reagins 
in the nucleus, presumably nucleoli. Reprinted with permission from [46], Copyright (2016) by the 
American Chemical Society 


scattering patterns into principal components [26]. Using the micro-SAXS approach 
and hydrated adult cardiomyocytes, the myofibril diffraction signal reflecting inter- 
filament distances of thick filaments (myosin) and thin filaments (actin) could indeed 
be observed from selected regions within a single cell, see Fig. 15.8. 


15.4 Coherent X-ray Imaging of Cells 
15.4.1 Ptychography 


In scanning SAXS, the real-space resolution is determined by the focal spot size. 
Contrarily, super-resolution below spot size can be achieved by CDI, where an over- 
sampled far-field diffraction pattern is inverted by solving the phase problem. This 
was first achieved only under the restrictive assumption of a compact object with 
known support, fully coherent and plane wave illumination [54, 55]. The restric- 
tion to a ‘compact object’ was lifted by ptychographic CDI (pCDD, which uses a 
compact probe and partial overlap between illuminations of adjacent scan points 
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Fig. 15.7 Scanning SAXS with a micro-focused beam. a, b Visible light fluorescence micrographs 
of a freeze-dried neonatal rat cardiac tissue cell with labeled actin cytoskeleton. The most significant 
filaments were identified by the filament sensor algorithm [52], see (a) (yellow lines), and a mean 
orientation angle was computed for all segmenting blocks with a blocksize equal to the stepsize 
of the SAXS scan, see (b) (red lines). ce X-ray dark field of the SAXS scan. d PCA results on 
each diffraction pattern showing the fiber orientation as black lines. The degree of anisotropy was 
quantified by a unitless order parameter wpa as further detailed in [26, 50]. e Composite image 
showing the diffraction patterns with respect to their relative recording position. Black and gray 
arrows show the direction of the major and minor principal component axis. Diffraction patterns 
can be integrated azimuthally, see (e, inset), indicating the local structure factors, which can then 
be described mathematically by different fitting models (pink lines). Scale bar: 40 um. From [50] 


to phase the diffraction pattern [56-58]. Exploiting the constraint of separability 
for phasing, both an unknown object o and an unknown probe p can be recon- 
structed [58], so that pCDI became applicable not only to extended samples, but 
also to non-idealized illumination functions (probes), including partial coherence 
[59]. A first application of ptychography to biological cells was shown in [14] for 
the gram-positive bacterium Deinococcus radiodurans [14]. Images of the projected 
mass density in freeze-dried preparations were reconstructed with a phase resolution 
below 0.01 rad up to a half-period resolution of 85 nm, at a relatively low fluence of 
6.6 x 10’photons/j1m?. Subsequent studies for the same bacteria, with substantially 
improved optics and phase retrieval extended this to 3D tomographic imaging [42, 
43], see Fig. 15.9, and to cryogenically fixed cells [32]. By poviding mass density 
values, the results contributed to the long standing debate on DNA compactification 
in bacterial nucleoids [43]. In [60], malaria infected red blood cells were imaged by 
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Fig. 15.8 Micro-SAXS 
results obtained for adult 
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ptychographic tomography. In [61], quantitative phase and amplitude pCDI recon- 
structions were used to segment features and to localize mitochondrial constriction 
sites in mouse embryonic fibroblasts. Frozen-hydrated cells, which tolerate more 
dose than room temperature samples, and thus offer higher resolution, were imaged 
in 2D [62, 63], and in 3D by X-ray nano-tomography in [64]. Correlative cellular 
ptychography with functionalized Fe nanoparticles was shown in [65]. 


15.4.2 Holography 


The downsides of both original CDI and ptychography are: (i) stringent oversampling 
and coherence conditions, (ii) comparatively slow convergence of the reconstruction, 
(iii) rather long data acquisition for large field of view (no zooming out), and (iv) the 
high radiation dose. A major problem of applying CDI and pCDI to biological cells is 
in fact the low contrast in the hydrated state, and the correspondingly high radiation 
dose and associated radiation damage [66]. In fact, the typical radiation doses of CDI 
are in the range of 10’ — 10° Gy, creating a need for cryogenic conditions to warrant 
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Fig. 15.9 Ptychographic reconstructions with simultaneous reconstruction of probe and object. 
a Phase image of a Siemens star test pattern with 50 nm lines and spaces, and b complex-valued field 
of the probe (focused beam) in the sample position. e Cuts along the optical axis of the propagated 
probe indicating significant astigmatism in the X-ray optics (Fresnel zone plate with upstream 
focusing). The results show that even with an astigmatic probe ptychography can yield faithful 
object reconstructions. d Tomographic reconstruction of Deinococcus radiodurans bacteria. Dense 
regions are attributed to DNA-rich bacterial nucleoids, obtained with the same setup at 6.2 keV 
photon energy (cSAXS beamline, Swiss Light Source). From [42] 


structure preservation. The theoretical dose-resolution curve is characterized by an 
already very steep algebraic increase with an exponent 3 < y < 4, as derived for 
the case of Fraunhofer far-field diffraction [66]. As an example, CDI was applied to 
hydrated bacterial cells (‘wet CDI), demonstrated at about 30 nm resolution (stated 
as half-period throughout this work), but at the ‘cost’ of 10° Gy [67]. 

These shortcomings could possibly be solved by holographic X-ray imaging, 
employing a highly coherent divergent cone for illumination and geometric adjustable 
magnification, and providing a much directed encoding of phase information. As 
shown in [12], X-ray waveguides can provide the clean wavefronts required for high 
resolution inline holography. While this approach has not yet reached the resolution 
of pCDI or CDI, it was proven experimentally to be very dose efficient. A dose 
advantage over CDI was also found by numerical simulations [68]. Furthermore, by 
variation of the object position, it becomes easily possible to zoom in an out and thus 
record multiple magnifications of a sample. Combined studies by holography and 
ptychography with scanning SAXS were published in [51] and [42, 44, 45], respec- 
tively. Holography, in particular, allows for low dose overviews recorded before 
identification of regions-of-interest for scanning SAXS. Figure 15.10 illustrates the 
state-of-art for holographic tomography of cells [69]. 

By now the initially rather exotic area of CDI of cells has become more and 
more established. However, even ptychography, which is already more common than 
holography, has not yet spread into the biological community, for which many forms 
of microscopy are available. Contrarily, for tomography of biological materials and 
tissues it has found convincing use and for this purpose, the holographic approach has 
become more widespread than ptychography, see [70] for a state-of-the art application 
to human neuronal tissue. One very promising application of X-ray imaging on cells 
is the combination with other microscopy methods, as decribed in the next section. 
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Fig. 15.10 Holo-Tomography reconstruction of a macrophage, labeled with barium sulfate and 
osmium tetroxide. a Virtual slice through a plane perpendicular to the tomographic axis. The 
cell with its internal contrast particles, the Petri dish and the resin used for embedding can be 
distinguished based on grey shades which are proportional to electron density. b Virtual slice 
coplanar to a projection direction. ce 3D rendering of the dataset, showing barium particles (green) 
and the cell volume (half-transparent blue). Phase retrieval was based on the CTF approach with 4 
distances. Scalebars denote 5 um. From [69] 


15.5 Correlative Microscopy 


Correlative X-ray and optical microscopy may help to overcome some of the persist- 
ing challenges in X-ray data analysis of scanning SAXS and will at the same time 
provide information not accessible by employing just one or the other method. In 
particular, correlative optical fluorescence microscopy can help to formulate models 
and constrain parameters, by providing additional information on specifically labeled 
biomolecules. In the absence of such information, previous studies of biological cells 
by scanning SAXS were mostly analyzed in terms in empirical, model-free data 
analysis, with only few exceptions [16, 44]. As shown in Sect. 15.3, already with- 
out a scattering model, a wealth of parameters can be extracted from the diffraction 
patterns in an automated manner, for example total diffraction intensity (darkfield), 


15 Scanning Small-Angle X-ray Scattering and Coherent X-ray Imaging of Cells 423 


differential phase contrast, second moments of the scattering distribution, power-law 
exponents, or anisotropy parameters based on fitting of the 2D scattering patterns 
or principal component analysis (PCA) [17, 50]. By inspection of the diffraction 
patterns and the real-space maps, it seems plausible to attribute diffraction signals in 
some locations to the presence of filamentous proteins of the cytoskeleton or DNA in 
the nucleus. However, such conclusions need confirmation by optical fluorescence 
microscopy, at highest possible resolution. With this information at hand, the local 
diffraction patterns can be interpreted and analyzed, providing in the end much more 
information than either the optical image or the X-ray data alone. 

In [20] a correlative microscopy approach for biological cells and tissues was 
proposed, which combines holographic X-ray imaging, X-ray scanning diffraction, 
and stimulated emission depletion (STED)-microscopy as a super-resolution optical 
fluorescence technique. All three imaging modalities were integrated into the same 
dedicated synchrotron nano-focus endstation GINIX at the P10 beamline of the 
PETRAII storage ring (DESY, Hamburg). With this setup, both labeled and unla- 
beled biomolecular components in the cell can be imaged in a quasi-simultaneous 
scheme, exploiting the complementary contrast mechanisms of X-ray microscopy 
and optical fluorescence. This was demonstrated for heart tissue cells with a fluores- 
cently labeled actin cytoskeleton. Micrographs of allthree modalities were registered. 
The principal directions of the anisotropic diffraction patterns were found to coincide 
to acertain extent with the actin fiber directions. Further, actin filaments bundles were 
also recognizable in the phase map reconstructed from holographic recordings. We 
expect that the co-localization constraints provided by such correlative microscopy 
approaches will be instrumental for the formulation of advanced diffraction models, 
to fully exploit the data which is becoming available (Fig. 15.11). 
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Fig. 15.11 Correlative microscopy. Neonatal cardiac tissue cell with labeled actin, imaged in three 
different modalities. a STED micrograph. Scale bar: 5 um. b X-ray phase reconstruction. e X-ray 
dark field map of the cell obtained by scanning SAXS. From [20] 
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15.6 From Cells to Tissues 


For many biological functions it is important to integrate structural aspects on 
scales ranging from the single cell to the entire organ. Heart contractility as one 
of the most important physiological functions is a perfect example of how function 
relies on an intricate molecular and cellular architecture. The classical research field 
which addresses the way that cells form a functional tissue is histology, which com- 
bines sophisticated sectioning with optical or electron microscopy. While the cyto- 
architecture can thus be imaged in 2D, conventional histology lacks the capability 
to probe the tissue structure in full 3D. Furthermore, the high resolution molecular 
structure as revealed by electron microscopy can only be carried out in very small vol- 
umes, and structural variations within the tissues. Important and functionally relevant 
structural properties, such as for example the 3D vector field of myofibril orientation 
in heart, cannot be suitably assessed by conventional histology. Furthermore, differ- 
ent regions within the heart exhibit variations of the intrinsic sarcomere structure. For 
example, the acto-myosin lattice spacing near the ventrical wall may differ from the 
outer perimeter of the heart, as observed for mouse heart [49], see Fig. 15.12. Using 
the scanning SAXS approach it becomes possible to probe molecular orientation of 
heart tissue, combining the required real space resolution with molecular sensitivity 
by diffraction [49]. Extending this to a series of slices, or - as an alternative - to 
X-ray darkfield tomography [71], one could possibly probe the entire 3D assembly 
of myofibrils. In this way, the multi-scale challenge of mapping molecular structures 
and orientation over length scales of an entire heart may become possible in future. 

Scanning small and wide-angle X-ray scattering (SAXS/WAXS) and X-ray fluo- 
rescence (XRF) with micro-focused synchrotron radiation have also been used in [72] 
to study histological sections from human brain tissue, notably of the midbrain and 
of substantia nigra. Both XRF and scanning SAXS/WAXS were shown to visualize 
tissue properties, which are inaccessible by conventional microscopy and histology. 
While scanning SAXS provided the local orientation and ordering of myelin struc- 
ture, WAXS provide the distribution of cholesterol crystallites, and XRF maps of 
transition metals. All observables were intrinsically registered (aligned) since they 
were acquired in the same scan. Transition metals and more generally elemental 
distribution has become a relevant topic for neurodegeneration, for example the iron 
distribution and speciation in Parkinson’s disease (PD). In [72], variations in transi- 
tion metal concentration between a PD and CTR patient were observed. The XRF 
analysis showed increased amounts of iron and decreased amounts of copper in the 
PD tissue compared to the control. PD tissue scans also exhibited increased amounts 
of crystallized cholesterol. However, as only tissues from one PD patient and one 
control were available, [72] can only serve as a proof-of-feasibility. 
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Fig. 15.12 Scanning SAXS of mouse cardiac muscle. a Optical micrograph of a histological 
section. b X-ray darkfield image, i.e. the integrated scattering intensity. e Multiple scattering param- 
eters extracted in a fully automated manner from the diffraction patterns of the scan: (left) anisotropy 
of the scattering resulting from the (1, 1) reflection from the acto-myosin lattice, (center) the corre- 
sponding myofibril orientation, and (right) the mean position of the reflection along q,, as obtained 
from a Gaussian fit to the structure factor J (q,) with a background model. Scale bar: 1mm. From [49] 


15.7 Outlook: FEL Studies of Cells 


The advent of highly brilliant pulsed X-ray radiation from free electron lasers (FEL) 
has opened up a novel route to high resolution imaging by short femto-second (fs) 
pulses, before radiation damage takes place [73-75]. Ultra short pulses not only 
enable highest temporal resolution for example for pump-probe experiments, but 
also static single pulse imaging unaffected by any (Brownian) motion. If the struc- 
ture is recorded by ultra-fast elastic scattering before changes occur due to multiple 
ionization, this holds promise to record sharp still images of extremely high resolu- 
tion, unaffected by structure deterioration due to radiation damage. This so-called 
“diffract and destroy” principle was initially coined for single molecule CDI envi- 
sioned for FEL, but can also be applied to colloids, viruses or (small) entire cells. 
To this end, feasibility of imaging living cells by ultrafast CDI was discussed in 
[76], based on numerical simulations of the interaction of FEL pulses (10-100 fs) 
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with biomolecular matter. It was concluded that subnanometer resolutions could be 
reached on micron-sized cells at fluences of 10!'-10!’photons/|1m?. For mimivirus 
[77] and small bacteria of microbacterium lacticum [15], single pulse CDI has been 
indeed been demonstrated by now. However, resolution was much lower, i.e. 37 nm 
(half-period) for the bacterial cells. 

Furthermore, single-shot CDI of large extended objects such as eukaryotes is in 
practice impeded by oversampling restrictions, the beam stop induced missing data 
problem [22], and the lack of a priori information (support). This is well illustrated 
by the FEL experiments on freeze dried cells presented below in Fig. 15.13. Notably, 
current pixel detector technology restricts the field of view to around d ~ lum 
for hard X-rays, which is prohibitive for eukaryotic cells. At the same time, FEL 
imaging of cells is limited to 2D. 3D imaging by serial shots with randomly sampled 
projections is possible for identical particles, but not for most cell types. A serial 
implementation of cellular imaging with high throughput of cells would nevertheless 
give a useful distribution of 2D views for a given cell type and state. 

How can the maximum support cross section d resulting from support/ 
oversampling constraints be increased? As we have d = Xz/2p, where p is the detec- 
tor pixel size, z the distance, and A the wavelength, it seems reasonable to increase 
the wavelength for eukaryotic cells, which in addition also increases the scattering 
intensity. To this end, single pulse CDI experiments on freeze dried cells were also 
carried out in the water window spectral range, using FEL radiation of A = 8 mn at 
the FLASH facility (DESY, Hamburg). While the signal and hit rate were sufficient, 
see Fig. 15.13, several restrictive conditions have impeded reconstruction: (1) Insuf- 
ficient degree of coherence: The global degree of coherence of the third harmonic 
was determined to be only around 0.4 [78]. (2) Insufficient data at low g: The missing 
data due to overly sized beam stops and beam stop holders leads to unconstrained 
low frequencies. (3) Insufficient sampling: The large fields of view needed for the 
adherent cells were still not compatible with the oversampling conditions. From this 
attempt and other examples, one can learn that the strategy to collect and analyze 
data for eukaryotic cells at FEL has to be revised. 

The solution to this challenge could be a hybrid approach based on combining 
two separate paradigms of structural analysis: imaging and diffraction. Low and 
medium resolution projected electron density should be assessed in real space, based 
on near-field (holographic) phase retrieval, while high resolution (molecular) Fourier 
components should be assessed by model based analysis of diffraction data, relax- 
ing the restrictive conditions for inversions. Importantly, by holographic (near-field) 
imaging—with the sample in the defocus position of the nano-focused FEL—even 
extended objects (without support constraints) can be reconstructed from single shot 
data. To this end, iterative algorithms using mild constraints can be readily employed 
in the near-field regime, such as negativity of the phase, unit amplitude (pure phase 
object) or sparsity, as well as combinations thereof. This hybrid approach could be 
implemented for example at the MID instrument at XFEL, using a compound refrac- 
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Fig. 15.13 Single pulse CDI experiments on cells, using soft X-ray FEL radiation at BL2/FLASH, 
DESY (A = 8 mn), at experimental parameters as reported in [78-80]. a Schematic of freeze dried 
cells attached to the multi-window SiRN array (576 100 x 100 um squared windows) with membrane 
thickness of 100nm. The system is designed such that a single shot at a given micro-chamber will 
leave the other chambers with cells intact. b Optical microscopy of the D. radiodurans (top) and 
SK8K18 (human epithelial) cells (bottom). e Setup with the focused beam, multi-window sample 
holder, and the thin foils used to suppress the 3rd harmonic at Azra = 2.66 nm by a 200nm free- 
standing Pd-filter, installed directly in front of the CCD detector. To avoid radiation damage on 
the CCD detector, a beam stop blocks the central pixels. d Single pulse diffraction patterns of 
D. radiodurans bacteria and human epithelial cell SK8K18. The magnification shows that the 
oversampling criterion is fulfilled at this wavelength. The power spectral density shows a dynamic 
range of more than three orders of magnitude, and hence a sufficient signal. However, the existence 
of a beam stop and the lack of compact support impedes object reconstruction. From D.-D. Mai 
et al., unpublished 


tive lens system focusing to 50nm. At the same time, and in addition to the high 
resolution CCD or CMOS detectors needed for this imaging modality, a wide angle 
pixel detectors could be used to record the far field scattering intensity outside of the 
central diverging radiation cone, i.e. covering Fourier components corresponding to 
scales below 50 nm. Figure 15.14 shows a sketch of an FEL holography experiment. 
The sample is placed in controlled defocus position zı behind the focal plane of the 
CRL. The direct beam traverses a pixel detector with a hole, and reaches the high 
resolution detector at large distance where the in-line hologram is recorded. 
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Fig. 15.14 a Schematic of divergent FEL beam, calculated in natural units (Rayleigh length zp, 
waist wo) for 8keV photon energy, and beam focusing by 50 compound refractive lenses (CRL) 
made of Beryllium (R = 50um, f = 94.6mm, wo = 45.9nm); parameters are adapted to MID 
instrument of XFEL. Data can be recorded in two ways: (i) single shot far-field diffraction patterns 
recorded by a pixel detector (AGIPD) at 10m distance, and (ii) holographic recordings by a high 
resolution sCMOS detector with the sample in a defocus position. b Simulation of the beam intensity 
atzı = 5mm behind the CRL focus, overlaid with the cell phantom. Scale bar 2.5 um. ¢ Simulated 
hologram at 10m distance. Scale bar 1mm 
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Chapter 16 R) 
Single Particle Imaging with FEL Using geai 
Photon Correlations 


Benjamin von Ardenne and Helmut Grubmüller 


Abstract Scattering experiments with femtosecond high-intensity free-electron 
laser pulses provide a new route to macromolecular structure determination without 
the need for crystallization at low material usage. In these experiments, the X-ray 
pulses are scattered with high repetition on a stream of identical single biomolecules 
and the scattered photons are recorded on a pixelized detector. The main challenges 
are the unknown random orientation of the molecule in each shot and the extremely 
low signal to noise ratio due to the very low expected photon count per scattering 
image, typically well below the number of over 100 photons required by available 
analysis methods. The latter currently limits the scattering experiments to nano- 
crystals or larger virus particles, but the ultimate goal remains to retrieve the atomic 
structure of single biomolecules. Here, we use photon correlations to overcome the 
issue with low photon counts and present an approach that can determine the molec- 
ular structure de novo from as few as three coherently scattered photons per image. 
We further validate the method with a small protein (46 residues), show that near- 
atomic resolution of 3.3 Ä is within experimental reach and demonstrate structure 
determination in the presence of isotropic noise from various sources, indicating that 
the number of disordered solvent molecules attached to the macromolecular surface 
should be kept at a minimum. Our correlation method allows to infer structure from 
images containing multiple particles, potentially opening the method to other types 
of experiments such as fluctuation X-ray scattering (FXS). 
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16.1 The Single Molecule Scattering Experiment 


Despite the great effort in biomolecular structure determination, the structures of less 
than 1% (160,000) of the more than 21 million transcribed proteins [1] have been 
determined to high resolution [2]. Over the past years existing structure determination 
methods such as X-ray crystallography and NMR have been stagnating, leaving room 
for novel methods that can extend the knowledge of biomolecular structures. To this 
end, X-ray scattering experiments with single biomolecules have been proposed 
by Neutze et al. as a new de novo structure determination approach for proteins 
without the need for crystallization [3-7]. Single molecule X-ray imaging becomes 
possible due to newly-developed free electron laser that produce very high-intensity 
femtosecond-short X-ray pulses with a focus size of down to 100 nm. 

As illustrated in Fig. 16.1, in the experiment, a stream of (typically) hydrated and 
randomly oriented proteins enters the pulsed X-ray beam at a rate of one molecule 
per pulse. Despite the high photon flux of the incident beam, only a few photons are 
scattered by the molecules and recorded on the pixelized detector. 

Sample delivery is non-trivial due to the nanoscopic size of the biomolecules 
and several solutions have been proposed, e.g., using electrospraying techniques [8], 
gas focused liquid jets [9], oil/water droplet immersion jets [10] or embedding the 
molecules into polymers (lipidic cubic phase injector) to save material [11]. In each 


Fig. 16.1 Experimental setup of single molecule scattering imaging. A stream of randomly-oriented 
particles is injected into the high-intensity short-pulsed FEL beam, hit sequentially by femtosecond 
X-ray pulses, and the few coherently scattered photons (red dots) are recorded on the pixel detector. 
The spatial distribution of the photons follows the Fourier intensity of the molecule which is depicted 
here in light blue in the background of the photon pattern. After illumination, ionization effects 
charge the molecules and the resulting Coulomb forces quickly disintegrate the molecule 
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sample delivery method, it is important that the single molecules stay in their phys- 
iological environment in order to observe the their natural conformations. 

In the scattering process, ionization (Auger decay) charges the atoms in the 
molecule and leads to Coulomb explosion, coining the method as a “diffract and 
destroy” experiment. In fact, only 10% of all photons are scattered coherently, all 
others are absorbed due to the photo-electric effect and expelled shortly after from 
the molecules at lower energies. However, the short pulses, usually less than 100 fs 
long, outrun the severe radiation damage because the molecular motion in response 
to the changed electronic configuration is estimated to take longer than 100 fs [7, 
12] and the incident photons are scattered by the unperturbed structure before the 
molecule disintegrates. 

Like in conventional X-ray crystallography, only the intensities and not the phases 
are measured. In the absence of crystals, the measured signal is the continuous Fourier 
transformation of the molecule, rendering the phase problem accessible to established 
ab initio phase-retrieval methods [13]. 

Whereas previous X-ray sources, including synchrotron sources, have primarily 
engaged in studies of static structures, X-ray FELs are by their nature suited for 
studying dynamic systems at the time and length scales of atomic interactions. In 
contrast to methods that measure a structure ensemble (NMR, SAXS, FRET), this 
method gives access to single molecule images and, with a seed model, the images 
could be e.g., sorted probabilistically to distinguish between different native con- 
formations. Further, similar to nano-crystallography, in systems where reactions can 
be easily induced, e.g., by light, a sequence of structures at different reaction times 
may be recorded which opens the window to molecular movies as a long-standing 
dream [14]. Even without sorting, the variance of the native conformations can be 
assessed via the variance of the determined electron density in which flexible regions 
would be smeared out more than rigid protein motifs. 


16.2 Structure Determination Using Few Photons 


Single molecule scattering images sample spherical dissections (Ewald sphere ) of 
the continuous 3D Fourier intensity, Z (k) = |F[p(x)] le and the orientation of the dis- 
section depends on the orientation of the molecule at the time of illumination. The 
structure determination from these single molecule images faces two major chal- 
lenges. First, the orientation of the molecule at the time of illumination is unknown 
and hard to control because it is usually injected into the “reaction chamber” via 
electro-spraying in which the molecules tumble inside a solvent bubble. Second, 
only a low number of photons is coherently scattered (as a statistical Poisson pro- 
cess following the Fourier intensity) and the additional background noise from, e.g., 
inelastic scattering, the photo-electric effect or background radiation leads to very low 
signal-to-noise levels. In fact, we estimated that a rather small protein (46 residues) 
scatters only 20 photons coherently at realistic beam parameters of the next gener- 
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ation European XFEL which add an additional layer of complexity to the structure 
determination problem due to the additional Poisson noise (shot noise). 

Over the past years, several structure determination methods have been proposed 
and demonstrated which mainly fall into two major classes. The first class of methods 
predicts the orientation of the molecules at the time of illumination for each scatter- 
ing image either explicitly or implicitly e.g., through statistical similarities between 
images or by using a coarse seed model. Images that belong to the same orienta- 
tion are averaged and these averages are assembled into the 3D intensity similar to 
cryo-EM. However, almost all of the orientation classification methods are limited 
to scattering datasets with usually many more than 100 average photons per image. 

The second class of methods forgoes the classification of orientations by using 
photon correlations as an averaged summary statistics of the entire image dataset 
that is independent of the individual orientations and will be covered in this Chapter. 
Previous attempts have focused on extracting as much as possible information from 
two correlated photons using additional knowledge such as symmetry or molecular 
rotations around a fixed axis. From early work by Kam on electron micrograph 
images, it is known that two-photon correlations do not carry sufficient information 
to retrieve the full 3D intensity ab initio [15, 16]. Motivated by these observation, we 
suspected and eventually validated the claim that three photon suffice and therefore 
developed a method method that allows for de novo structure determination from 
as few as three coherently scattered photons per single molecule X-ray scattering 
image. The main idea is to determine the molecule’s intensity / (k) from the full 
three-photon correlation t(k,, k2, k3, a, 3) which is accumulated from all photon 
triplets in the recorded scattering images, independent of the respective molecular 
orientations and therefore free of errors associated with the classification of the 
orientations. 


16.2.1 Theoretical Background on Three-Photon 
Correlations 


A single photon triplet is characterized by the angles a and ( between the photons 
and the distances of the photons to the detector center (Fig. 16.2). Each triplet is 
comprised of three correlated doublets (kı, k2, a, ), (k2, k3, 8) and (ki, k3, œ + 8) 
and the angles are chosen as the minimum difference between the pairs, a, 0 € 
[0, 7]. The probability of observing a coherently scattered photon at pixel position 
k* is proportional to the intensity Z (k*) at this pixel which lies on the projection of 
the intensity /(k) on the Ewald sphere in 3D Fourier space. The full three-photon 
correlation t (k1, k2, k3, a, 3) is the sum over all possible triplets which is equivalent 
to the orientational average (),, of the product between three intensities / (k) that lie 
on the intersection between the Ewald sphere and the 3D Fourier density, 


t(k1, ko, ka, a, Bray = (La (Ki(kı,0)) - Lo (3 (ko, a)) + I (kåk, 8). (16.1) 
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Fig. 16.2 Schematic 
depiction of the three-photon 
correlation using an 
exemplary synthetic single 
molecule scattering image of 
Crambin with only 
coherently scattered photons. 
In the detector plane ky ky 
the recorded photons are 
grouped into triplets, each of 
which is characterized by 
distances kı, k2, k3 to the 
detector center (orange lines) 
and the angles a and 8 
between the respective 
photons (orange circular 
arcs) 


Here, without loss of generality, the three vectors kı*, kı* and kı* are the 
projection onto the Ewald sphere of the three photon positions ky = (kı, 0, 0), 
k2 = ko(cos a, sina, 0) and k = k3(cos ß, sin 8, 0) in the detector plane. These 
positions are chosen as one arbitrary realization of the tuple (k1, k2, k3, a, P). 

For the orientational average ()„ itis assumed that in the experiment the orientation 
of the molecule is unknown and uniformly sampled. Note that the orientational 
average can either be expressed as an average over all rotations of /,(k) for fixed 
kj 2,3 (our approach) or as an average over all rotations of the vectors k; 2,3, for a 
fixed J (k). 

The orientational integral over all possible triple products of 3D intensities / (k) 
in 16.1 is challenging to calculate and may be simplified by decomposing 7 (k) into 
spherical shells with radius k and by expanding each shell using a spherical harmonics 
basis [17], 

Tk) = X Am (K) Yim (9, 9). (16.2) 


Im 


The coefficients A,„(k) describe the intensity function on the respective shells 
and are non-zero only for even l € {0, 2, 4, ..., L} because of the symmetry of 
I(k) = I(-k) (Friedel’s law). In this description, a 3D Euler rotation w of /(k) 
is expressed by transforming the spherical harmonics coefficients according to 
AS (k) = S pm Dim Atm" (k), using the rotation operators D!,,,, which are com- 


posed of elements of the Wigner D-matrix as defined, e.g., in [17], yielding the 
rotated intensity, 


1,8) = Y. Aim (K) Yim 8, 9) Diy (W) (16.3) 


Imm’ 
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Inserting the spherical harmonics expansion of the rotated intensity /,, (k), evaluated 
at positions kj, k} and k on the Ewald sphere (9; = cos“! Ch into the expression 
7 T 


for the three-photon correlation, (16.1), yields 


t(kı, ka, k3, Q, Dam) = 


5X dS O) Anm 1) Anm (k) Alm, (k) 


Lı bl; mı m2 m3 m/ m, m$ 


Yim, (01 (k1), 0) + Yrm, (92(K2), ©): Yim, (O3 (k3), 3) 
(D; .D? ,.pb* ) , (16.4) 


mim, mm, m3m% 


such that the orientational average only involves the elements of the Wigner D-matrix 


mm!“ 
Lh L 


Using the Wigner-3j symbols E fee A [18], the product of two rotation 


elements D/ „, reads 
ith 
l l MM 
Di Des = dy 2 CEP UE) (16.5) 
L=|—b| MM’ 


Lh L 

mı ma —M 

l ly L L 
e m, Kr Dum: 


With the orthogonality theorem for orientational averages of the product of two 
Wigner D operators, 


j 1 
(Die D! | = zy gð dm m na (16.6) 


m3m 
the three-photon correlation finally reads 


t (ki, ko, 3,0, dam) = I, JO Anm (kt) Anm (k2) Alm, (K3) (16.7) 


Lı l l3 mı m m3 


L hb hb 
mı m —m3 

m-m {fi b B 
2 CD e m, En 


Pick och 
m; mMm, m3 


Yim’, (01 (k1), 0) Yrm, (02 (k2), a) Yin, (03 (k3), 2). 
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This expression only involves sums of products of three spherical harmonics 
coefficients A7„(k) with known Wigner-3j symbols and spherical harmonics basis 
functions Y;,,(@, p). The numerical calculation of the three photon correlation (for- 
ward model) is the computationally limiting step in the structure determination 
approach. The correlations, expressed in spherical harmonics terms, are faster to 
calculate than e.g., the numerical integration, and they allow for adapting the num- 
ber K (L? + 3L + 2)/2 of spherical harmonics basis functions to the target resolution 
via the largest considered wave number keut, the number K of used shells between 
0...Kcur, and the expansion order L. The hierarchical properties of spherical harmonics 
basis functions further allow to determine the structure first with low angular reso- 
lution and then to successively refine it to higher resolutions and higher expansion 
limits, respectively. 


16.2.2 Bayesian Structure Determination 


Currently no analytic inversion of the three-photon correlation in (16.7) is known, and 
the number of unknowns (e.g., 4940 for K = 26, L = 18) is too large for a straight- 
forward numeric solution. Instead we have developed a probabilistic approach [19] 
in which we asked which intensity Z (k) is most likely to have generated the com- 
plete set of measured scattering images and triplets, respectively. To this end, we 
considered the Bayesian probability p (with uniform prior) that a given intensity 
I(k), expressed in spherical harmonics by {A,„(k)}, generated the set of triplets, 


{ki k3, ks, al, if er 


T 
p (ki, ki, kh, a, B},_, „| {Am N = ] [Eki ki, ki, a, BA, (16.8) 


i=l 


Due to the statistical independence of the triplets, this probability p is a prod- 
uct over the probabilities 7(k', k ki, a’, BÏ) of observing the individual triplets 
i which is given by the normalized three-photon correlation f (kı, k2, k3, a, 3). 
Here, f (kı, k2, k3, œ, B) is calculated using (16.7) for varying intensity coefficients 
{Aim (k)} and the coefficients that maximized p ({k}, k}, ki, a’, 3" }) are determined 
using a Monte Carlo scheme as discussed in Sect. 16.2.4. 

In contrast to the direct inversion, the probabilistic approach has the benefit of 
fully accounting for the Poissonian shot noise implied by the limited number of 
photon triplets that are extracted from the given scattering images. We note that this 
approach also circumvents the limitation faced in previous works on degenerate three 
photons correlations by Kam [16], where only triples are considered, in which two 
photons are recorded at the same detector position. Because all other triples had to 
be discarded, Kam’s approach is limited to very high beam intensities, and cannot 
be applied in the present extreme Poisson regime. 
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Calculating the probability from (16.8) (and energy in the Monte Carlo scheme) 
is computationally expensive due to the typically large number of triples T. We 
therefore approximated this product by grouping triplets with similar a, 3 angles 
and distances k into bins and calculated the function t(kı, ka, k3, a, 3) for each 
bin only once, denoted tx, ‚x, ,x,,0,3, thus markedly reducing the number of function 
evaluations to the number of bins. To improve the statistics for each bin, the intrinsic 
symmetry of the triple correlation function was also used. In particular, all triplets 
were mapped into the sub-region of the triple correlation that satisfies kı > ka > ka. 
In this mapping, special care was taken to correct for the fact that triplets with k; = 
ky A kz ork, Æ ko = kz or kı = k3 A ky occur 3 times more often thank; = kp = k3 
and triplets with kı Æ ka Æ k3 occur 6 times more often. To compensate for different 
binsizes, each bin was normalized by ky k2k3. 


16.2.3 Reduction of Search Space Using Two-Photon 
Correlations 


The high-dimensional search space may be reduced by utilizing the structural infor- 
mation contained within the two-photon correlation. In analogy to the three-photon 
correlation, the two photon-correlation is expressed as a sum over products of spher- 
ical harmonics coefficients A;,,(k) weighted with Legendre polynomials P, [16, 20], 


Crta = I, Pr (cos (0)) > Atm (kı) (w) A), (ko). (16.9) 
l 


m 


Please note that the œ which is seen on the detector is different from the angle 
a* = cos”! (sin(9,) sin(02) cos(a) + cos (01) cos(02)) between the two points in 3D 
intensity space due to the Ewald curvature (9 = cos™! (kà /4r). 

The inversion yields coefficient vectors A? (k) = (A? m aa AD) forall] < L < 
Kmax/2 and —l < m < l, as first demonstrated by Kam [16]. However, all rotations 
in the 2/ + 1-dimensional coefficient eigenspaces of A? (k) by U; are also solutions, 


A; (k) = U;A) (k). (16.10) 


The result implies that the inversion only gives a degenerate solution for the coeffi- 
cients and the intensity cannot be determined solely from two photons. Note that the 
maximum L, corresponding to the angular resolution of the intensity model, scales 
with the number of shells K max (or the inverse of the shell spacing Ak respectively) 
used for the two-photon inversion. 
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16.2.4 Optimizing the Probability Using Monte Carlo 


In our method, we decided to maximize the probability p from (16.8) with a Monte 
Carlo/simulated annealing approach on the ‘energy’ function 


E ({ki, kh, kå, a", Ë} | {Aim (k)}) 
= — log p ({ki, kå, ki, a’, B'}| {Aim(k)}) 
= —) logilki,K, kh, o, Boy (16.11) 


in the space of all rotations U; given by the inversion of the two-photon correlation 
discussed in the previous Section. 

Each Monte Carlo run is initialized with a random set of rotations {U;} and the 
set of unaligned coefficients [AD }. In each Monte Carlo step j, all rotations U/ are 
varied by small random rotations A;(6;) such that the updated rotations for each / 
(L < L)read U; ae A) (31) - U using stepsizes (;. In order to escape local minima, 
a simulated annealing is performed using an exponentially decaying temperature 
protocol, T(j) = Tinit eXp(j/T). Steps with an increased energy were also accepted 
according to the Boltzmann factor exp(-AE/T). We further used adaptive stepsizes 
such that all 3(J) were increased or decreased by a factor u when accepting or 
rejecting the proposed steps, respectively. Convergence was improved by using a 
hierarchical approach in which the intensity was first determined with low angular 
resolution and further increased to high resolution. To this end, the variations of low- 
resolution features were “frozen out” faster than the variations of high-resolution 
features. 

The random rotations fU; € Re were generated using QR decomposi- 
tions of matrices whose entries were drawn from a normal distribution as described 
by Mezzadri [21]. The rotational variations A; (3) were calculated via the basis 
transformation 

A; (8) = R/S; (8) R7 ' (16.12) 


with 
cos (8) — sin (6) 0 ..0 
sin (8) cos (8) O 0 
S; (8) = 0 0 Bir (16.13) 


0 0 
and random rotation matrices R; [22]. Here, sub-matrix L;_ı in S; is a 2/ — 1- 
dimensional unity matrix. 
By using the small rotational variations A; (8), the SO(n) is sampled ergodically. 
Approximately [1/(2 — 2 cos(3))]n - log(n) steps are necessary to achieve sufficient 


sampling aaccording to [22]. For the largest search space of L = 18 with a rota- 
tion dimension of n = 37 (n = 2L + 1) and a minimum stepsize of 8 = 0.025 rad, 


444 B. von Ardenne and H. Grubmiiller 


213,777 steps are required to sample rotations in SO (37) sufficiently dense. To 
ensure that the search space is exhaustively explored, we aimed at an optimiza- 
tion length of over 200,000 Monte Carlo steps. To this end, a time constant for the 
temperature decrease of r = 50000 steps was chosen. The initial temperature Tinit 
was calculated as 10% of the standard deviation of the energy within 50 random steps 
away from the starting structure using the initial stepsizes. Further, we used a factor 
u = 1.01 for the adaptive stepsizes. The hierarchical approach was implemented 
by distributing the initial stepsizes according to G(/) = (J — 1) such that spherical 
harmonics coefficients with larger expansion orders / are always varied with a larger 
stepsize 3(/) than coefficients with lower orders. 


16.3 Method Validation 


Currently, experimental single molecule scattering data is only available for very 
large icosahedral viruses and in the absence of single molecule scattering images 
of smaller bio-molcules such as proteins, we have resorted to synthetic scattering 
experiments to validate our method. Thus, we have tested the method with a Crambin 
molecule for which we have estimated approx. 20 coherently scattered photons per 
image at realistic beam parameters. To stay below the estimate of approximately 20 
photons per image, we generated up to 3.3 x 10° synthetic scattering images with 
only 10 photons on average, totalling up to 3.3 x 10!° recorded photons. With an 
expected XFEL repetition rate of up to 27 kHz [23], and assuming a hit-rate of 10%, 
this data can be collected within a few days. However, the data acquisition time 
substantially decreases to e.g., approx. 30min when on average 100 photons per 
image are recorded, reducing the total number of required photons by a factor 100 
to 3.3 x 108 (and reducing the number of images by a factor 1000 to 3.3 x 10°). 

For the synthetic image generation, we approximated the 3D electron density p(x) 
by a sum of Gaussian functions centered at the atomic positions x;, 


Natoms 
p(x) = $ Ni exp Ce), (16.14) 


i=1 


The heights and variances of the Gaussian spheres depend on the type of atom i. 
The variances g; correspond to the size of the atoms with respect to their scattering 
cross-section and the height is determined by N;, the number of electrons which are 
the potential targets for scattering. 

The absolute square of the electron densities’ Fourier transformation /(k) = 
|F[p(x)]|° was used to generate the images. In each synthetic scattering experi- 
ment, In each shot, the molecule, and thus also Z (k), was randomly oriented and on 
average P photons per image were generated according to the distribution given by 
the dissection of the randomly oriented Ewald sphere and the intensity /,(K). 
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To generate the distributions numerically, first, a random set of Npos positions 
{K;} in the k,k,-plane was generated according to a 2D Gaussian distribution G (K) 
with width ø = 1.05 A~! (specific to the Crambin intensity). Given a random 3D 
rotation U, rejection sampling was used to accept or reject each position according 
to € < I,(U- K;)/(M - G(K;)) using uniformly-distributed random numbers € € 
[0, 1] each. Here, the constant M was chosen as Imax - max(G(K)) such that the ratio 
I,,(U - K;)/(M - G(K;)) is below 1 for all K. 

In accordance with our most conservative estimate, the number of positions Npos 
was chosen such that on average 10 scattered photons were generated. For assessing 
the dependency of the resolution on the number of scattered photons, additional image 
sets with 25, 50 or 100 scattered photons were also generated (see Sect. 16.3.2). 


16.3.1 Resolution Scaling with Photon Counts 


Starting from the histograms obtained from 3.3 x 10° synthetic scattering images 
with 10 photons, we performed 20 independent structure determination runs. For all 
runs we used an expansion order L = 18, K = 26 shells and a cutoff keu = 2.15 A 
thus setting the maximum achievable resolution to 2.9 Ä. To assess the achievable 
resolution of the determined Fourier intensities, we calculated 20 real space electron 
density maps using the relaxed averaged alternating reflections (RAAR) iterative 
phase retrieval algorithm by Luke [13]. Figure 16.3 compares the average of the 
20 retrieved densities (a, green shaded structure) with the the reference electron 
density (b, blue shaded structure) which has been calculated from the Fourier density 
(including phases) with same cutoff ky; as (a). The cross-correlation between the 
two densities is 0.9. 

The resolution of the phased electron densities was characterized by the Fourier 
shell correlation (FSC), 


XO Filki) Folk)” 


kiek 


IA S 1k)? 


kiek kiek 


FSC(k) = (16.15) 


We have adopted the common definition of the resolution from cryo-EM [24] for 
cases in which the reference density is known. The resolution is then defined as 
the scattering angle kres at which FSC(k) = 0.5, yielding a radial resolution Ar = 
27/ kres. In cases where the two densities in the FSC come from densities retrieved 
from independent image-sets (cross-validation), a lower cut-off FSC(k) = 0.143 is 
typically used. Here, we have achieved a near-atomic resolution of 3.3 Å from the 
correlation derived from 3.3 x 10° images. 

Next, we have determined the structure from increasing number of images to 
asses how the resolution scales with the total number of observed photons and, 
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Fig. 16.3 Comparison of the retrieved electron density (a) and the reference electron density (b). 
The reference density (b) was calculated from the known Fourier density using the same cutoff 
keut = 2.15 AW! in reciprocal space as (a). The resolution of the retrieved density is 3.3 A, the 
resolution of the reference density is 2.9 A and the cross-correlation between the two densities is 
0.9 


hence, the number of recorded images. To this end, electron densities were calculated 
and averaged as above starting from 1.3 x 10° and going up to 3.3 x 10° images 
(4.7 x 108 up to 1.2 x 10! triplets). 

Figure 16.4 shows the FSC curves of all retrieved (averaged) densities along with 
the 0.5 cutoff (vertical dashed line) and the corresponding resolutions (inset). In 
Fig. 16.5 visualizes how the resolution improves with the increasing number of 
detected photons by comparing four electron densities that were retrieved from his- 
tograms with 2.0 x 108 to 3.3 x 10!° photons. 

As mentioned before, the best electron density was retrieved with a near-atomic 
resolution of 3.3 A (Fig. 16.5a) from the histograms that was derived from a total of 
3.3 x 10!° photons. Decreasing the number of photons by a factor of 10 decreased the 
resolution only slightly by 0.4-3.7 A (Fig. 16.5c), which indicates that very likely 
fewer than 3.3 x 10!° photons suffice to achieve near-atomic resolution. If much 
fewer photons are recorded, e.g. 2.0 x 108, the resolution decreased markedly to 
7.8 A (Fig. 16.5a) and even 14 A resolution for 1.3 x 107 photons. For comparison, 
the diameter of Crambin is 17 A. 

To address the question how much further the resolution can be increased, we 
mimicked an experiment with infinite number of photons by determining the intensity 
from the analytically calculated three-photon correlation. As can be seen in Fig. 16.4 
(purple line), the resolution only slightly improved by 0.1 A to about 3.2 A indicating 
that at this point either the expansion order L or insufficient convergence of the Monte 
Carlo based structure search became resolution limiting. To distinguish between 
these two possible causes, we phased the electron density directly from the reference 
intensity, using the same expansion order L = 18 as in the other experiments. 
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Fig. 16.4 Fourier shell correlations (FSC) of densities retrieved from 1.3 x 10’ to 3.3 x 10!° 
photons (4.7 x 108-1.2 x 10!? triplets) and infinite photon number. As a reference, the “opti- 
mal” FSC is shown (dashed grey), which was calculated directly from the known intensity using 


the same expansion parameters. The inset shows the corresponding resolutions estimated from 
FSC (kres) = 0.5 


The reference intensity is free from convergence issues of the Monte Carlo struc- 
ture determination and the resulting electron density only includes the phasing errors 
introduced by the limited angular resolution of the spherical harmonics expansion in 
Fourier space. The FSC curve of the “optimal phasing” (grey dashed) shows only a 
minor increase in resolution to 3.1 A indicating that the Monte Carlo search decreases 
the resolution by 0.1 A. The remaining 0.2 A difference to the optimal resolution of 
2.9 A at the given keut (not shown) is attributed to the finite expansion order L and 
the corresponding phasing errors. 

We have also independently assessed the overall phasing error by calculating the 
intensity shell correlation (ISC) between the intensities of the phased electron densi- 
ties Iphased = |F [Pretrievea] K and the intensities before phasing Zretrievea. The phasing 
method does not markedly deteriorate the structures. 


Fig. 16.5 Electron densities (a) 
retrieved from a 2.0 x 108, b 
8.2 x 108, c 3.3 x 10° and d 


3.3 x 10! photons © 


(c) 
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16.3.2 Impact of the Photon Counts per Image 


The maximum number of triplets T that can be collected from an image with P 
photons is T = P - (P — 1): (P — 2)/6. However, these triplets are not all statisti- 
cally independent; instead, starting from 3 photons, each additional photon adds only 
two real numbers to the triple correlation: a new angle (3 (with respect to another 
photon) and a new distance k to the detector center. 

The sampling of the three-photon correlation is improved by either collecting 
more photons per image P or by collecting more images 7. However, because for 
each image, the orientation (3 Euler angles) needs to be inferred, the total amount 
of information that remains available for structure determination increases with the 
number of photons per image. Therefore, for every structure determination method, 
including ours, increasing P is preferred over increasing /, especially at low photon 
counts. For larger photon counts, the ratio between the 3 Euler angles and P becomes 
small and hence also the information asymmetry between P and 7. 

To assess this effect, we asked how the resolution depends on the number of 
images / and the photons per image P and therefore carried out additional synthetic 
experiments using image sets with 10, 25, 50 and 100 average photons P per shot at 
different image counts yielding different total number of photons. In Fig. 16.6, the 
achieved resolutions are shown as a function of the number of collected photons for 
four different P = [10, 25, 50, 100]. For the best achievable resolution of 3.3 Å, e.g., 
the total number of required photons decreases by a factor of 100 from 3.3 x 10!° 
to 3.3 x 108 photons (and the number of images decreased by a factor of 1000 from 
3.3 x 10° to 3.3 x 10° images) when increasing the photons per image from 10 to 
100, thus substantially decreasing the data acquisition time from over 20.000 min to 
only 30 min. 
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Fig. 16.6 The resolution as a function of the total number of photons collected from images with 
10, 25, 50 and 100 photons on average 


16.3.3 Structure Results in the Presence of Non-Poissonian 
Noise 


To asses how additional noise (beyond the Poisson noise due to low photon counts) 
affects the achievable resolution, we have carried out synthetic scattering experiments 
including Gaussian distributed photons, G(k, 7) = (2n07)7!/2 exp (—|k|?/207) (see 
Fig. 16.7), as a simple noise model. From the generated scattering images, intensities 
S(k) were determined with the discussed structure determination scheme. 

Assuming that the noise is independent of the molecular structure, the obtained 
intensities S(k) = Z (k) + yN (k) are a linear superposition of the molecules’ inten- 
sity 7 (k) and the intensity of the unknown noise N (k). Accordingly, the noise was 
subtracted from S(k) in 3D Fourier space using our noise model N (k) = G(k, ø) 
and the estimated signal to noise ratio y. Since the spherical harmonics expansion of 
a Gaussian distribution is described by a single coefficient Gao, m=o(k) = G (k, o) 
on each shell k, the noise subtraction simplified to A760, #6° (k) = Ayo no (k) — 
yG (k, o). 

As discussed in the main text, we assessed the effect of noise for different 
Gaussian widths (o = [0.5, 0.75, 1.125, 2.5] Ä=! and several signal to noise ratios 
y € [10%, ..., 50%]. Figure 16.7 compares the Crambin intensity (green) with the 
different Gaussian distributions (puples shades, black) at signal to noise ratio of 
y = 100%. 

The Figure also shows the noise expected from Compton scattering (grey), which 
was estimated using the Klein-Nishina differential cross-section [25]. 
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Fig. 16.7 Comparison of linear cuts through the normalized intensities of noise distributed accord- 
ing to Gaussian functions with widths ø = [0.5, 0.75, 1.125, 2.5] Ä-! (purple shades and black), 
noise from Compton scattering (grey) and noise from the a disordered water shell of 5 Ä thickness 
(aqua). A cut through the Crambin intensity without noise (green) is given for reference. Note that, 
due to the normalization in 3D, the noise intensities are shown at a signal to noise ratio y = 100%; at 
different signal to noise ratios, the noise intensities are shifted vertically with respect to the Crambin 
intensity 


tee (ENTE E `, 
do = Im (=) E + E sin J dQ, (16.16) 
with the scattering angle 6, the energy of the incoming photons E, the energy of 
the scattered photon E’ = E/(1 + zd — cos d)), the fine structure constant œ = 
1/137.04 and the electron resting mass m, = 511 keV/c?. As can be seen, the noise 
from Compton scattering (grey) is described well by a Gaussian distributions with 
width o = 2.5 A“! (black), and thus was used to approximate incoherent scattering. 

Finally, we also estimated the noise from the disordered fraction of the water shell 
by averaging the intensities of 100 Crambin structures with different 5 A-thick water 
shells. The resulting intensity (aqua) is similar to the reference intensity with fewer 
signal in the intermediate regions (0.2 A~! < k < 1.0A~!) and more signal in the 
center and the high-resolution regions (k > 1.0 A~'). Since the noise of the water 
shell depends on the structure of the biomolecule, potentially combined with ordered 
water molecules, it is unlikely to be well described by our simple Gaussian model. 
Therefore, simple noise subtraction will be challenging, and more advanced iterative 
techniques will be required. 

In Fig. 16.8, the electron densities from the discussed runs are compared to each 
other. 
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Fig. 16.8 Comparison of the electron densities retrieved from images containing noise of different 
levels y € [10%, ..., 50%] and widths ø € [0.5, 0.75, 1.125, 2.5] 
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16.4 Structure Determination from Multi-Particle Images 


Structure determination approaches are usually limited by the total number of single 
molecule shots that can be recorded. Remarkably, our method can process images 
with multiple illuminated particles because the two- and three-photon correlations 
of these images are connected to the correlations of the single particle shots. In order 
to show this relation, here, we derived the connection for the two-particle case. 

The intensity of an image containing two randomly oriented particles /> (k) is the 
superposition of the the individual particle intensities’ with the relative orientation 
being random, 


h(k) = (1 (k) + Lk). (16.17) 
= I(k) + (1,(K)) 
= I (k) + I! (k). 


The two-photon correlation then reads, 


Ppa = (kKı)k(ko)) >, (16.18) 
= (I(KDI (Ky) + 1 (K1)! (ko) + I! (k DI (Ka) + I (ki)! (k2)) >w 
= cf p a + 31 kal (ko) 


„k2,a 
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and the three-photon correlation of the two-particle case is calculated as, 


Panas = (2 (Ki) (Ka) (Ks). (16.19) 
= ((I(Kı) + I (kı)) K?) + I (kJ) Ka) + 1" (ka), 
= (I(Kı) (Ky)1 (K3) + I (k) (K2) (K3)+ 
1(Ky)1" (ka) I (K3) + 1(Ki)7 (Ka) 1" (k3) + 
1! (ki)! (ko) 1 (K3) + 1' (ky) I (Ka) (ka) + 
IKI! (k2) I" (k3) + I! (ki) I! (ka) T" (ks). 


(2) 1 a) 1 d) 
= dat TR) IT Ken a + 
Tks )et gg + 4 (k) T (ko) T (ka) 


The expressions above are readily generalized to the N-particle case and the only 
remaining unknowns are the mixture ratios ; for the N;-particles, i.e. the fraction 
of images containing N; particles. These ratios are equivalent to the ratios between 
the integrated intensities of the individual images which identifies the total number 
of particle in each image and therefore can be calculated from the experimental data 
without additional effort. 

The robustness of the two- and three-photon correlation in the presence of multiple 
particles in the beam potentially makes our method also interesting for other types 
of experiments such as fluctuation X-ray scattering (FXS) [26, 27] which is similar 
to solution scattering. In conventional solution scattering, the orientational averag- 
ing that occurs during the X-ray illumination results in signal which carries only 
1-dimensional (radial) intensity information and all angular information is averaged 
out. In FXS experiments, however, the X-ray pulses from synchronous or free elec- 
tron lasers are much shorter than the orientational diffusion times of the molecules 
such that they appear to be fixed in space. In each image multiple particles with dif- 
ferent orientations are recorded and as a result speckle patterns emerge from which 
angular correlations can be calculated as described above. 
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Chapter 17 A) 
Development of Ultrafast X-ray Free get 
Electron Laser Tools in (Bio)Chemical 

Research 


Simone Techert, Sreevidya Thekku Veedu and Sadia Bari 


Abstract The chapter will focus on fundamental aspects and methodological chal- 
lenges of X-ray free electron laser research and recent developments in the related 
field of ultrafast X-ray science. Selected examples proving “molecular movie capa- 
bilities” of Free-electron laser radiation investigating gas phase chemistry, chemistry 
in liquids and transformations in the solid state will be introduced. They will be dis- 
cussed in the context of ultrafast X-ray studies of complex biochemical research, and 
time-resolved X-ray characterisation of energy storage materials and energy bionics. 


17.1 Introduction 


After a preparation phase of almost twenty years—from the first vision of a common 
research effort in synchrotron radiation in Europe (1975) up to the first electron beam 
entering the first high-brilliant, third generation X-ray synchrotron ring in 1992 (at 
the European Synchrotron Radiation Facility)—synchrotron radiation has proven its 
unique X-ray photons characteristics in brilliance, coherence, pulsed properties and 
high-repetition frequencies—allowing for the development of various synchrotron- 
typical experiments such as high-resolution X-ray crystallography and anomalous 
X-ray scattering, the various types of X-ray spectroscopy techniques, X-ray time- 
resolved methods and coherent and incoherent X-ray diffraction techniques (to name 
some examples). The techniques have been adapted for and applied to material 
research, chemistry, solid state and biophysics, earth and planetary research, astro- 
physics etc. 
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In the begin of the 2nd millennium, synchrotron researchers faced a similar situ- 
ation as in begin of 1980s—again with a 20 years agenda from 2000 on—entering 
their next research “quantum step” by extending the novel technique of the so-called 
Free-Electron Laser (FEL) principle [1-6] from short arrangements of water-cooled 
magnets to very long arrangements of water-cooled (and now even superconducting) 
magnets [1-6]. Since in free-electron lasers high-speed electrons freely move in a 
magnetic field [1, 2], being accelerated and decelerated in the field, and by thus gen- 
erating radiation, changing the properties of the magnetic fields as explained allow 
for moving from the FEL-generation of micro-waves [2, 4] to the ones of X-ray 
radiation [4-6]. Free-electron lasers are therefore fully tunable and they have the 
widest frequency range of any laser type. 

By developing the FEL-principle to the hard X-ray regime [4-7]—a gap between 
the “traditional” synchrotron radiation world and the FEL radiation world has been 
closed—one of the reasons why FEL radiation is also sometimes called “synchrotrons 
of the 4th generation” (at least from the perspective of synchrotron researchers). 
As a consequence, the properties of the novel FEL sources and the properties of 
the produced X-ray radiation have been amplified by many orders of magnitudes 
compared to current synchrotron sources of the 3rd generation. 

Milestones in the FEL development agenda were the first soft X-ray free-electron 
laser in operation—2005 the Free-Electron LASer FLASH at DESY in Hamburg, 
the first hard X-ray free-electron laser in operation—2009 the Linear Coherent Light 
Source LCLS at SLAC in Stanford (which is a so-called Ist generation of FELs based 
on a low-repetition frequency approach), and the first high-repetition frequency hard 
X-ray FEL in operation—2017 the European X-ray Free Electron Laser European 
XFEL at DESY and in Schenefeld. How these “quantum jumps” in X-ray properties 
can be made used for novel approaches of research in chemistry will be topic of this 
chapter. 

In order to reflect milestones developments from 2006 on, the chapter reflects 
recent research summaries [8-10] as well as novel developments. 

If one reads about chemical reactions and the measurement of real-time responses 
in chemical reactions, one very quickly ends up with the concept of the resolution of 
measurements on ultrafast time scales, which are characteristic for the movement of 
atoms in molecules, i.e. femtoseconds. A femtosecond is defined as the millionth of 
a billionth of a second. Life-relevant motions, however, can be as slow as seconds or 
even up to minutes’ or hours’ time scales. The origin of these time scale differences 
is based on the complexity of the coordination space of a proceeding reaction [8-14]. 

In a classical kinetic scheme, following the explanatory approach, the gradients 
on the potential energy hypersurface define the molecular dynamics. Statistically 
population-weighted, they compose to the kinetics of a chemical reaction. The coor- 
dinates describing the dynamics and kinetics of a chemical reaction are the reaction 
coordinate and the energy. The reaction coordinate is defined as a one-dimensional 
projection of the reactant’s and product’s normal coordinates, which span the poten- 
tial energy hypersurface of reactant and product and the potential energy hypersur- 
faces of their transitions (Fig. 17.1). The energy gradient along a reaction coordinate 
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Fig. 17.1 References [11-16]: Potential energy hyper-surface of a chemical reaction. The reaction 
coordinate (abscissa) of a chemical reaction is defined as a one-dimensional projection of the normal 
coordinates spanning the reactant’s potential energy hypersurface versus the normal coordinates 
spanning the product’s potential energy hypersurface, and their corresponding transitions [11, 14]. 
The figure is adapted from the [8-10] and references therein 


is defined as reaction dynamics, the energy gradient along the normal coordinates as 
molecular dynamics. 

Commonly, the potential energy is presented as the characteristic curve or hyper- 
surface in the graph, and the ordinate axis presents the sum of the potential and 
kinetic energy of the nuclei involved in the chemical reaction (Fig. 17.2). Poten- 
tial and kinetic energy of molecules can be detangled through the projection of the 
potential energy onto the total energy axis. The activated complex and transition state 
(according to Eyring) includes an imaginary vibrational mode [11-14, 16]. 

From a chemical physicist’s point of view, one would like to understand which 
elementary chemical processes happen at which time scales, and how these time 
scales are inter-connected. To what extend do structural motifs “freeze in” time and 
dynamics information of chemical reactions? Which type of apparatus needs to be 
built and which kind of methods need to be developed for investigating the created 
femtosecond “time stamps” in the structure of complex matter during a chemical or 
biochemical reaction? 

Ultrafast X-ray methods bear the potential of determining the complexity of 
chemical reactions—during their reactions, in particular in the bulk, with tech- 
niques which utilize the specific characteristics of X-ray/matter interaction: in well- 
ordered systems, X-ray crystallography as a Thomson scattering process allows for 
element-specific determinations such as electron densities (redox states) and high- 
precision spatial resolution determination of atoms in lattices from which hydrogen 
bonding, chemical bonding or van der Waals stacks can be derived. In less-ordered 
and disordered systems, X-rays deliver element-specific information of the investi- 
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Fig. 17.2 References [11-16]: Total energy of a chemical system as the sum of potential energy 
(V) and kinetic energy (T) of the molecules (ordinate). Inside the graph, only the contribution of the 
potential energy is plotted; its projection onto the ordinate allows determining the kinetic energy 
[12, 16]. The figure is adapted from the [8-10] and references therein 


gated molecules utilizing, for example, X-ray spectroscopy, X-ray absorption, photo- 
induced electron cascades, or X-ray and electron emission properties. Site-specific 
information is obtained by elastic and inelastic scattering processes such as multidi- 
mensional X-ray spectroscopy, X-ray diffraction, or X-ray scattering. The chemical 
consequences of X-ray/matter interactions leading to fragmentation can be charac- 
terized by X-ray mass spectrometry. 

Characteristic for the X-ray photons generated in synchrotrons and free-electron 
lasers are their 


1. energy tunability (allowing for excitation-energy-sensitive methods like X-ray 
spectroscopy or advanced X-ray diffraction methods) 

. pulsed structure (allowing for in-situ and time-resolved X-ray methods) 

. defined polarization (allowing for advances in X-ray spectroscopy) 

. coherence (allowing for X-ray imaging or correlation spectroscopy methods) 

. high flux (allowing for high-resolution X-ray experiments in all experimental 
domains) 


nb Wh 


Fourth generation accelerator-based light sources (free-electron lasers, FELs) in 
the VUV or X-ray regime deliver ultra-brilliant coherent radiation in very short pulses 
(10'? to 10'3 photons/bunch/10-100 fs). In order to fully exploit their unique pho- 
ton capabilities, novel instrumentation is required based on single-shot (collection) 
schemes. Moreover, hundreds up to trillions of fragment particles, ions, electrons 
or scattered photons can emerge when a single light flash impinges on matter with 
intensities up to (predicted for the XFEL) 10°” W cm? In order to meet these chal- 
lenges, in the starting time of FLASH (Free-Electron Laser in Hamburg [16a]) and 


17 Development of Ultrafast X-ray Free Electron Laser Tools ... 461 


the LCLS (Linac Coherent Light Source [16b]), various experimental chambers and 
endstations have been designed. Further constructions, in particular for relevance for 
this chapter have been developed and are under construction at the European XFEL 
[16e], as shortly been explained in the following. 

Starting from this introduction as a comprehensive summary of the basic princi- 
ples, in the following, various FEL methods developed so far will be summarised, 
including the concept of filming chemical reactions in real time utilizing ultrafast 
high-flux X-ray sources [11-16], X-ray Diffraction and Crystallography for Con- 
densed State Chemistry Studies—Crystallography with Ultrahigh Temporal and 
Ultrahigh Spatial Resolution [14, 17-52] and Applications in organic electronics 
and energy research [14, 53-59]; from From Local to Global: Ultrafast Multidimen- 
sional Soft X-ray Spectroscopy and Ultrafast X-ray Diffraction Shake Their Hands 
[60-76] and Applications in Bimolecular Reaction Studies and Photocatalysis [77- 
80] to Applications in Unimolecular Liquid Phase Reaction Dynamics [81-89] and 
Applications in Bioelectronics, Aqueous and Prebiotics Reaction Dynamics [90-108] 
to Applications in Biophysics and Gas Phase Biomolecules [109-121]; from Ultra- 
fast Imaging of Gas Phase Reactions [122-127] and Applications in Nanoscience 
and Multiphoton-Ionisations [128-150] to Applications in Unimolecular Gas Phase 
Dynamics [122, 151-163] and Outlook. 


17.2 The Concept: Filming Chemical Reactions in Real 
Time Utilizing Ultrafast High-Flux X-ray Sources 


In a proof-of-principle experiment at the white beam beamline at the ESRF (The 
European Synchrotron Research Facility) in 2001, it has been demonstrated that 
high-flux, pulsed X-rays—as created with synchrotrons of the 3rd generation—can 
act as the “photons of choice” for studying the dynamics and kinetics of small chem- 
ical systems on their complex reaction landscape [1]. These studies have been used 
to define various expectation values for time-resolved experiments at free-electron 
lasers and saddling the ground for ultrafast X-ray experiments at these sources. Since 
then, also the phrase of recording the molecular movie has been born (Fig. 17.3) 
[2—6b]. 

Figure 17.3 summarizes the principles of such a “molecular movie” approach: 
after the initialization of a chemical reaction with a short laser pulse, ultrafast 
X-ray FEL snapshots take photographs of the X-ray spectroscopic or X-ray diffrac- 
tion signal. By varying the time delay between laser pump and X-ray probe pulse, 
information about the structural changes as a function of time are collected. 

Time-wise, the criterion for “recording the molecular movie” is given when the 
time resolution of the pump and probe sources meet the time scales of the structural 
dynamics investigated. The resolution criterion for structural dynamics studies is 
fulfilled in chemistry, when the refined structure allows for determining the electron 
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Fig. 17.3 References [11-16]: Principle of the “molecular movie”. After the initialization of a 
chemical reaction by a pulsed trigger laser, ultrafast X-ray snapshots and photographs of the X-ray 
pulses are collected as a function of time (courtesy by DESY, MPIbpC and European XFEL) 


density or the charge densities (which are equal in the redox state) around a moving 
atom. High-resolution X-ray crystallography studies contains such precision [2, 8— 
16, 21, 23, 25, 35, 44, 46, 51]. 


17.3 X-ray Diffraction and Crystallography for Condensed 
State Chemistry Studies—Crystallography with 
Ultrahigh Temporal and Ultrahigh Spatial Resolution 


Crystallography with ultrahigh temporal and spatial resolution allows studying pho- 
tochemical reactions beyond conventional quantum chemical approaches. Far beyond 
any present laboratory technique, time-resolved synchrotron (picosecond time reso- 
lution) and FEL (femtosecond time resolution) experiments emphasize the unique- 
ness of the pulsed, ultrafast, high brilliant and coherent X-ray methods and metrology. 
For chemical bond breaking and bond formation, the criterion for spatial resolution 
is met in periodic systems (crystallography) when 0.01-0.001Ä resolution diffrac- 
tograms yield high precision structural information [8-16, 21-23, 25, 35, 44, 46, 
51]. 

Figure 17.4 reflects the changes of X-ray synchrotron beam characteristics when 
evolving from synchrotrons of the 2nd generation towards hard X-ray free elec- 
tron lasers. The diffractograms have been collected on a molecular crystal of the 
same species, the same crystal quality and the same orientation. Utilizing broadband 
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Fig. 17.4 References [17-52]: Evolution of the average brilliance of synchrotron radiation light of 
synchrotrons of the 2nd generation (DORIS) though the ESRF towards LCLS free-electron laser— 
characterized at the high-resolution diffraction pattern of the same molecular crystal system with 
same crystal quality (courtesy by DESY, ESRF and LCLS). The figure is adapted from the [8-10] 
and references therein 


wiggler radiation in 2nd generation synchrotrons (F1/DORIS) Laue diffraction pat- 
tern have been collected. Taped undulator radiation of synchrotrons of the 3rd gen- 
eration yields in quasi Pink-Laue diffractograms (ID09/ESRF). 

Compared to the Pink Laue white beam at XPP as well as CXI beamline of LCLS, 
the FEL radiation is about one to two orders of magnitude smaller in bandwidth, 
allowing only the investigation of a statistical number of Bragg reflections for small 
molecular crystals when the crystal is rotated. 

Small molecular crystallography at free-electron lasers could be quantitatively 
utilized for structure determination by the combination of traditional Laue crys- 
tallography and FEL- specific serial crystallography techniques with rapid sample 
exchange and based on a single shot data collection strategy analogous to the devel- 
opments of time-resolved Laue diffraction at synchrotrons of the 3rd generation. 

In contrast to conventional Laue crystallography, for normalization purposes dur- 
ing FEL experiments, every diffractogram needs to be associated to an online col- 
lected X-ray spectrum. Utilizing high X-ray energies well above 15 keV (use of the 
third harmonic and smaller X-ray/atom cross section) with no monochromatization 
in pink Laue modus reduces radiation damage so that with a monotonically running 
spindle and randomly changing X-ray wavelength with known X-ray spectral charac- 
teristics, various orientations under defined X-ray conditions can be collected. They 
are sufficient for determining the orientation matrix of small molecules and hence 
following the indexing of the collected diffractograms. 
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Fig. 17.5 References [17-52]: Ultrafast time-resolved small molecule crystallography at LCLS, 
XPP and CXI beamlines (interim set-up installed during the first in-house test phase of LCLS). 
Left side, top: used three circle diffractometer. Right side, top: organic crystals exposed to 9 keV 
X-ray radiation (single shot) inducing severe radiation damage and 18 keV radiation allowing for 
the collection of various quasi Laue diffractograms with the Laue pink FEL beam under different 
orientations. Bottom: small molecule crystals lined up for a three-shot-serial type of crystallography 
experiment. The figure is adapted from the [8-10] and references therein 


Since the studied materials are normally compounds of small amounts, in the 
presented approach highest quality crystals are stacked behind each other in a cap- 
illary (or other type of sample target holder) and high energy X-ray radiation is 
utilized for collecting high resolution diffraction patterns and minimizing accumu- 
lative radiation damage. Due to the monochromaticity of FEL X-ray beams even in 
its pink Laue mode, quasi Pink Laue diffraction pattern will be recorded for various 
orientations—allowing a precise determination of the orientation matrix of the small 
molecule crystals. Additionally, on a single shot base, the Bragg peak intensities are 
wavelength and X-ray intensity normalized (Fig. 17.5). 

In the current example, unravelling the mechanisms of electron-transfer induced 
structural changes (ET), and the interplay of charge-transfer (CT) and structural 
reorganisations in complex molecular systems is of prime importance in both chem- 
ical and biological processes and has gained considerable importance over decades. 
ET/CT and the associated structural changes are pervasive, path to inevitably diverse 
processes occurring in nature, including light harvesting systems like photosynthesis 
to technological innovations such as solar energy converters and optical devices. It is 
challenging to trap the fleeting excited species or transient species and gain insight 
into the unprecedented dynamics which occurs in ultrafast time scale in the range 
of as to ps. Emerging scientific innovations in many synchrotrons and FELs, with 
high brilliance and extremely high photon flux have wide opened a new domain of 
research in investigating many ultrafast processes. Information about what happens 
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Fig. 17.6 References [17-52]: Optical laser pump/X-ray probe set-up for ultrafast X-ray crystal- 
lography. By varying the time-delay between optical laser pump and X-ray probe beam for different 
crystallographic orientations, X-ray diffraction snapshots as a function of time are collected. Adding 
the diffraction signals for one time-delay into one Ewald sphere allows an analysis of structural 
evolution as a function of time 


immediately after the triggering of the reaction is satisfactory but to evaluate what 
happens next is crucial. In order to tackle these questions, we must know how fast the 
processes happen and what are the time scales of each processes we are interested 
in. 

By employing time-resolved X-ray diffraction (TR-XRD) technique, the CT/ET 
reaction has been initiated by a laser pump pulse from the ground state to the excited 
state or the CT/ET state. The photo-induced subsequent structural changes have been 
monitored utilizing the pulsed structure of the X-ray beam by varying the time delay 
between the optical laser pump and X-ray probe pulses (pump/X-ray probe scheme) 
as been shown in Fig. 17.6. 

In the following examples of structural dynamics of organic and metallo-organic 
systems by trapping charge transfer states using high brilliance Pink Laue time- 
resolved X-ray diffraction are presented. The photo-induced dynamics has been stud- 
ied by looking from simple organic donor acceptor molecules (Fig. 17.7) to complex 
inorganic single crystalline systems. In the former the structural changes are induced 
by an ET/CT of an organic donor to an organic acceptor moiety, and in the later it 
is a induced by a metal to ligand charge transfer (MLCT) (see following chapter). 
Due to their high photo-conversion efficiencies and lifetimes, first are highly suited 
for solar cell developments, and last are highly suited as photo-catalysts. 

The experiments have been performed combining high resolution static sin- 
gle crystal XRD studies utilizing a home-source (Bruker Apex II) system and 
P11 beamline of PETRA III at DESY. To map out the details of the potential 
landscapes, ultrafast, light-induced Laue diffraction studies of the pure organic 
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Fig. 17.7 References [17-52]: Molecular movie of a non-conventional molecular diode which 
cannot be described within the Born-Oppenheimer approximation. Within femtoseconds and nearly 
immediately after optical photo-absorption the electrons are rearranged towards a conducting state 
meanwhile the structure knocks into a tilted configuration. The figure is adapted from the [8-10] 
and references therein 


system as well as the metal-complex have been performed at BioCARS, 14-ID 
beamline/Argonne (picosecond pink-Laue), ID09/ESRF (picosecond pink-Laue), 
XPP/CXI/LCLS (femtosecond pink-Laue) and P11/PETRA-II (picosecond, 
monochromatic). In the pure organic system within few hundreds of picoseconds 
electron density migration occurs and the structural changes are clearly visible from 
the photo difference map with prominent differences in the electron density maps at 
the electron donor and electron acceptor system (Fig. 17.7). In the metal-complex, it 
has been oberseved that upon photo-excitation the electron migration or the charge 
migration is mostly on the proximal atoms of the organic ligands from the metal 
center (not shown here). 

Figure 17.7 depicts the refined result of such an experiment- the femtosecond 
structural dynamics or “molecular movie” of a molecular crystal, which consist of 
only light elements (carbon, nitrogen). The patented system has the most efficient 
optical light/electron transfer rate possible (100%), by utilizing quantum effects such 
as electron and structural dynamics pathways which cannot be described through the 
conventional Born-Oppenheimer approximation. 


17.4 Applications in Energy Research 


Understanding this “beyond Born-Oppenheimer” behaviour and combining the prop- 
erties of this type of system with smart semiconducting plastic types of compounds, it 
is possible to build flexible, solid, plastic-type of solar cells and organic light emitting 
diodes with very high efficiency (Fig. 17.8). 

Both as a consequence of successive technical developments, and based on the 
chemical rules and time laws derived during the various method development process, 
it has become possible to optimize functional performances of solar cell organic 
materials and devices (Fig. 17.8). 
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Fig. 17.8 References [53-59]: Once structural relaxation processes are understood the derived 
structural and dynamical properties can be utilized to improve functional dynamics performance 
allowing developing new classes of organic solar cells (OSC) (see current-voltage (I-V)curve) or 
organic light emitting diodes (OLED) based on plastic (see inset). The figure is adapted from the 
[8-10] and references therein 


By moving the fundamental method developments into the application regime 
(again as a proof for the methods’ developments), the circle of promises closes that 3rd 
and 4th generation X-ray sources may help to optimize strategies for modern material 
performances. In order to test whether the structural dynamics laws derived through 
the “molecular movie” approach can have direct consequences for applications of 
the molecular system, electronic devices such as organic solar cells and organic light 
emitting diodes have been built and Fig. 17.8 presents such efforts. 

As can be seen on the current voltage (I-V) curve, when combining with semi- 
conducting plastic material (which by itself is not photo active), efficient organic 
solar cells with high mechanic flexibility and cheap in production can be designed. 

Since the time-resolved X-ray methods allow for detangling “local-to-global” 
and “global-to-local” structural responses, desired functional actions of a device 
like energy storage can be distinguished from “energy-eating” processes based on 
non-desired heating and energy quenching processes. Figure 17.7 presents the struc- 
tural mechanisms underlying an optimized all-over organic solar cell. Small atomic 
changes on the light-absorbing chromophore unit lead to a complete switching of its 
functional dynamics—from light absorbing solar cell devices [53-59] to a light emit- 
ting organic diode [54]. In another example the understanding of the crystallization 
processes of organic material out of time-resolved X-ray diffraction (TR-XRD) stud- 
ies has been influenced by the optimization of the recycling process of molten PET 
bottles to ultra-hard polyethylene [56]. Such ultra-hard plastic material is currently 
used in every 2nd wind craft machine produced world-wide. 
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The current examples emphasize, however, that the real world, functional mate- 
rials, pharmaceuticals, catalysts or energy converting materials are not always crys- 
talline and far from being periodically ideally arranged as it could look like when 
performing model type of investigations. If the intrinsic spatial resolution of the 
system does not allow for such detailed investigations, a combination of ultrafast 
X-ray spectroscopy and ultrafast X-ray diffraction or scattering as the “local to global 
approach” deliver configuration and charge information of the molecules studied 
[17-52, 60-76]. This approach will be described in the following chapter. 


17.5 From Local to Global: Ultrafast Multidimensional 
Soft X-ray Spectroscopy and Ultrafast X-ray 
Diffraction Shake Their Hands 


As ultrafast X-ray spectroscopy and X-ray diffraction can “shake their hands” in X- 
ray Free-electron laser science, they open up new ways to study complex chemical 
reactions. 

X-ray diffraction [17-59] and X-ray spectroscopy [60-76] are complemen- 
tary techniques. X-ray spectroscopy allows probing the electronic properties in an 
element-, orbital- and site-specific way, for example bonding or oxidation state 
changes [63]. Since one gets local information for the system under investigation 
the method is referenced as the local approach. In X-ray diffraction the structural 
changes of the whole bulk are probed which is termed as the global approach. With 
these approaches it is possible to get the overall structural properties of the target 
system. Furthermore, by applying both methods more complete information can 
be obtained. However, to follow whole reaction pathways or reaction intermediates 
between the start and end of a reaction, the experimental approaches have to be 
extended using the time-resolved method, i.e. the pump-probe scheme, as described 
in the previous section and shown in Fig. 17.9. 

With the availability of the first X-ray FEL, FLASH at DESY in Hamburg, 
Germany, it became possible to implement X-ray spectroscopy and diffraction for 
investigating ultrafast chemical reaction processes. In the following, we present two 
pioneering soft X-ray experiments performed at the FEL facilities FLASH and LCLS. 

The first example is a time-resolved X-ray diffraction measurement on silver- 
behenate [60-76] with an outlook towards combination of the liquid jet with X-ray 
spectroscopy, the second example is afemtosecond time-resolved X-ray spectroscopy 
experiment on iron-pentacarbonyl (Fe(CO);) [78-81]. Figure 17.10a shows the appa- 
ratus which has been developed for such experiments at FELs. The ultrafast X-ray 
diffraction of a silver-containing redox system embedded in a supramolecular organic 
structure has been studied (Fig. 17.10b). The experiment is a proof-of-principle, uti- 
lizing FEL radiation for ultrafast X-ray diffraction of chemical systems in real-time. 
By investigating the time-evolution of the Bragg reflections (Fig. 17.10c) complex 
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Fig. 17.9 References [60-76]: Principle of various types of pump-probe experiments applied to 
the optical region (left) and the corresponding type of experiments in the X-ray region (right). An 
optical laser pump initiates a chemical reaction and an optical (left) or X-ray laser pulse (right) 
probes the proceeding reaction pathways. Right: Both the X-ray spectroscopy and X-ray diffraction 
signal can be recorded, yielding complementary information. The figure is adapted from the [8-10] 
and references therein 


EN, g 
107.5 108 -10 0 10 20 30 


-3 Ä-1 
Gp / 103 A Uys 


Fig. 17.10 References [60-76]: a The X-ray photon endstation at the FEL, FLASH at The modular 
built-up endstation has been used for the experiment presented in (b) and (c). b Bragg diffraction 
peak (110) of silver-behenate studied by single-shot FEL pulses. c The time-dependent behavior of 
the Bragg peak after photo-excitation. The figure is adapted from the [8-10] and references therein 
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photo-induced transformation kinetics of partially photo-chemically induced and 
heat-propagation-influenced reaction kinetics has been found. 

The decay curve depicts the propagation of the ultrafast transformation throughout 
the whole material. The black areas present the non-transformed material, the white 
areas the island of transformed material. 


17.6 Applications in Bimolecular Reaction Studies 
and Photocatalysis 


Another experiment in the field of soft X-ray spectroscopy has recently been worked 
out for studying photocatalysis. For pioneering time-resolved X-ray spectroscopy 
measurements the FLASH end station in Fig. 17.10 has been modified by adding a 
soft X-ray spectrometer in Rowland circle geometry [63, 66, 77-80]. This so called 
Liquid Jet (LJ) end station has then been successfully used in resonant inelastic X-ray 
scattering (RIXS) experiments on Fe(CO)s at the LCLS-FEL at SLAC in Stanford, 
CA, USA. Similar end stations have been built to perform soft X-ray spectroscopy 
experiments at high flux X-ray facilities at DESY/Hamburg [15] and HZB/Berlin. 
In the above mentioned experiment [77-80], the dissociation of iron-pentacarbonyl 
(Fe(CO)s) in ethanol has been studied in real-time. After optical excitation with 266 
nm photons, Fig. 17.1 1a, Fe(CO)s dissociates into iron-tetracarbonyl (Fe(CO)4) and 
carbon monoxide (CO) under solvent-assistance. For every time-delay, the incident 
monochromatic FEL photon energy has been scanned over the Fe 2p edge, and the 
X-ray emission spectra have been recorded. 

The time evolution was deduced from differences of the pumped (positive 
time-delay) and the un-pumped (negative time-delay) Fe(CO); emission spectra. 
Figure 17.11 summarizes the ethanol-assisted Fe(CO)s photo-dissociation pathways 
and simulations are compared to the experiment. The complexity of reaction increases 
due to the formation and decay of a triplet state, where Fig. 17.1 1b—e summarises the 
evolution of the valence-electronic structure of Fe(CO), in ethanol upon femtosecond 
spin crossover and complex formation. 

The involved orbitals are assigned according to the Fe 2p and 3d, or ligand 2p 
characteristics, and according to the symmetry along the Fe-CO bonds. The star (*) 
in Fig. 17.11b, marks the antibonding orbitals of the electron configuration of the 
Fe(CO); ground state and the single-electron transitions of the laser pump—xX-ray 
probe processes. The dissociation of (Fe(CO);s — Fe(CO)4 + CO) is initiated by 
the optical dy —> 27r* excitation. The RIXS measurements at the Fe L3-absorption 
edge to final valence-excited states involves the probing of the d, — dž transition. 
In Fig. 17.11(c, top), the difference RIXS spectra (RIXS intensity of incident photon 
energy versus energy transfer in eV) of the summed pumped and un-pumped sample 
is illustrated. Figure 17.11(c, bottom) displays the time structure in the regions (1-4) 
in comparison to simulated populations of the excited (E) and triplet (T) states and 
ligated complex (L). Figure 17.11d shows the measured RIXS spectra for Fe(CO)s 
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Fig. 17.11 References [77-80]: a Fe(CO)s5 photo-dissociation. b-e Evolution of the Fe(CO)4/EtOH 
valence-electronic structure upon femtosecond spin crossover and complex formation. Term schema 
of the laser pump—xX-ray probe experiment (b), measured (c, d) and simulated (e) RIXS scattering 
spectra, with and without included kinetic model on the excited state (E), triplet state (T) and 
ligand complex (L), Fe(CO)4. d: The experimental Fe L3-RIXS intensities (encoded in color) 
versus energy transfer and incident photon energy. (d, top) Fe(CO)s ground state (negative delays, 
probe before pump, scattering to d? dž! and d/2r*! marked by circles). (d, middle and bottom) 
Difference intensities for 0-700 fs and 0.7-3.5 ps time delays, respectively. e Calculated Fe L3- 
RIXS intensities (color coded as in (d) and molecular-orbital diagrams of Fe(CO)s (ground state 
and hot), excited, triplet, and singlet Fe(CO)4 (all three non-complexed) and solvent complexed 
singlet Fe(CO)4-EtOH. The figure is adapted from [8-10] and references therein 


at negative time delays and the difference after delay intervals of 0-700 fs and 0.7- 
3.5 ps respectively (color-encoded). By subtracting the negative time delay with a 
weight of 0.9, the pumped contributions are isolated (top: Fe(CO)s, ground state 
(negative delays, probe before pump, scattering to d/d*'! and d/27*! marked by 
middle/bottom: difference intensities for 0-700 fs and 0.7-3.5 ps time delays). The 
experimental data is compared to RIXS calculations displayed in Fig. 17.1 le. Sim- 
ulated Fe L3-RIXS intensities and molecular-orbital diagrams of Fe(CO); in the 
ground and hot state and Fe(CO), in various valence excited states are depicted. 

The 2p — LUMOresonance positions and dy — dž RIXS transitions are marked 
by arrows. The RIXS pattern at early time evolution can only be reproduced when a 
complex between the Fe(CO)., with the solvent is taken into account. 
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17.7 Applications in Unimolecular Liquid Phase Reaction 
Dynamics 


In the hard X-ray regime [81-89], the “local to global” approach has also been estab- 
lished by combining X-ray spectroscopy with X-ray scattering techniques. The ultra- 
fast structural dynamics of various metal organic systems has been studied applying 
these techniques [84, 88, 89]. 


17.8 Applications in Bioelectronics, Aqueous and 
Prebiotics Reaction Dynamics 


In bioelectronics, aqueous and prebiotics reaction dynamics and biophysics, the 
ultrafast photon-in/photon-out developments utilizing high flux X-ray sources allow 
investigating the properties of biorelevant solvents and proteins during their structural 
reactions [90-108, 115, 116, 118, 121]. The literature-referenced examples include 
various types of small up to macromolecular model systems studied with X-ray 
radiation of ultrashort, high flux X-ray sources—X-ray scattering on quasi-periodic 
systems (Fig. 17.12) [90, 94-103], X-ray scattering on macromolecules (Fig. 17.13) 
and (Fig. 17.14) [90, 115, 116, 118, 121] and X-ray spectroscopy (Fig. 17.15) 
[91-93, 104-108]. 

Two examples of dynamical studies of soft condensed matter via high-flux time- 
resolved X-ray scattering will be given [90-108, 115, 116, 118, 121]: first, the 
studies of the dynamics of a phase transition of a liquid crystal-to-microemulsion 
system (Fig. 17.12) [95], and second, a real-time study of complex protein dynamics 
(Fig. 17.13/Fig. 17.14) [115, 116] will be given. 

In liquid crystal type of soft condensed matter systems (Fig. 17.12), a manifold of 
phase-ordering transitions relevant to chemical and biological systems occur, ranging 
from liquids to self-assembled soft solids (like membranes or liquid crystals). In the 
present case the dynamics of the driving forces (activation energy and entropy) of a 
liquid crystal-to-microemulsion phase transition has been studied (Fig. 17.12 (left)). 
The purpose of this work was to clarify the influence of concentration effects of the 
amphiphilic molecules on the nature of these self-assembly processes. 

By photosensitization of the model system (polyalkylglycolether (Cj9E4), water, 
decane, and cyclohexane) with laser dyes, the phase transition could effectively be 
photo-induced and controlled through the absorption of optical photons (as a photo- 
induced phase transition, PIPT). The photo transformation conditions were chosen 
in such a way that the system was in thermal equilibrium as starting conditions. 

By applying time-resolved photo small-angle X-ray scattering it has been found 
that the conversion process depends on the surfactant concentration and the activation 
energy, which is observable through the length of the induction time (Fig. 17.12 
(right)). The phase-transition, though photo-triggered, is still diffusion controlled in 
the rate-determining steps. 
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Fig. 17.12 References [90-95, 95-108, 115, 116, 118, 121]: (Left) Phase diagram of the system 
CıoE4, water, n-decane, and cyclohexane at equal volume fractions of water and oil. Cyan represents 
an area where two phases coexist, dark gray the ME phase, and white the LC phase. The dots 
represent points of the phase boundaries. Arrows indicate the concentrations of the PIPT experiment. 
Inset: schematical drawings of the corresponding phases with the assignment of characteristic 
length scales like correlation length € (ME) and distance d between the lamellae (LC). (Right) 
Photoinduced ME-LC phase transition investigated by TR-SAXS. The decrease of the ME and the 
increase of lamellar phase resemble in a concomitant increase of the discovered Bragg peak. The 
monoexponential nucleation and growth period is preceded by a concentration dependent induction 
time. Inset: Traces of the LC scattering. The figure is adapted from [8-10] and references therein 


In the second example, X-ray scattering techniques, comprising of small-angle/wide- 
angle X-ray scattering (SAXS/WAXS) techniques are increasingly used to charac- 
terize the structure and interactions of biological macromolecules and their com- 
plexes in solution (Figs. 17.13 and 17.14) [90, 94-103, 115, 116, 118, 121]. It is 
a method of choice to characterize flexible, partially folded and unfolded protein 
systems. X-ray scattering is the last resort for proteins that cannot be investigated 
by crystallography or NMR and acts as a complementary technique with different 
biophysical techniques to answer challenging scientific questions. The marriage of 
the X-ray scattering technique with the fourth dimension “time” yields structural 
dynamics and kinetics information for protein motions in hierarchical timescales 
from picoseconds to days (Fig. 17.13). 

In the “fourth-dimension — time” of X-ray scattering technique (Fig. 17.13) the 
timescales accessible range from hours down to femtoseconds, even for investigating 
non-photoactive protein dynamics with different time-resolved X-ray scattering tech- 
niques. Various time-resolved and in-situ X-ray scattering techniques listed are com- 
plementary to the photon-triggered time-resolved X-ray scattering techniques and 
can be adaped to structural dynamics investigations of ubiquitous non-photoactive 
proteins. 

Depending on the time scale of the system studied, it has been demonstrated 
that it is furthermore possible to merge X-ray scattering techniques like diffuse 
X-ray scattering with pressure jump, temperature jump, electric field modulations, 
and structural freezing methods or, on the chemical modulation side, with, rapid 
mixing or photo-switching methods. 
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Fig. 17.13 References [90-108, 115, 116, 118, 121]: The “fourth-dimension — time” of X-ray 
scattering technique. The timescales accessible to investigate non-photoactive protein dynamics 
by using different time-resolved X-ray scattering techniques and the typical signal changes have 
been presented with representative published literature. a Inherent, b pressure-jump, ¢ electric- 
field modulations, d electromagnetic field modulations, e structural freezing. f SAXS coupled with 
size-exclusion chromatography (SEC) technique and g rapid-mixing with the help of microfluidic 
devices. The figure is adapted from the [8-10] and references therein 


Figure 17.13 summarises typical X-ray scattering signal changes—inherent (a), 
as pressure-jump (b), in electric-field modulations (c), as electromagnetic field mod- 
ulations (d), as structural freezing (e) and in small angle X-ray scattering experiments 
coupled with size-exclusion chromatography (SEC) technique (f). Special attention 
is given to rapid-mixing approaches with the help of microfluidic devices (g) and 
(Fig. 17.14). Various techniques listed in the summary figure have independently been 
developed in neighbor projects of the collaborative research center (see “references” 
and references therein). 

Figure 17.14 presents a scheme for a rapid-mixing time-resolved small angle 
X-ray scattering experiments for the investigation of the ubiquitin unfolding process 
as model system and model process. Figure 17.14a presents the 20-microchannel 
continuous-flow microfluidic device and Fig. 17.14b the time-resolved small angle 
X-ray scattering (SAXS) setup for the kinetic ubiquitin unfolding studies (prov- 
ing the secondary structure unfolding on the millisecond time scale), followed by 
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Fig. 17.14 References [91-93, 104-108]: Rapid-mixing TR-SAXS experiments to investigate the 
ubiquitin unfolding process. a 20-microchannel continuous-flow microfluidic device. b TR-SAXS 
setup for the kinetic ubiquitin unfolding studies (A: ubiquitin solution, B: 8 M guanidinium-HCl 
solution, C: water, D: 20-microchannel microfluidics device, E: external glass capillary (X-ray 
probing area), F: capillary holder, G: X-ray beam, H: detector). c The integrated rapid mixing 
SAXS experimental setup at the cSAXS beamline, PSI Switzerland. The figure is adapted from 
[8-10] and references therein 


the integrated rapid mixing SAXS experimental setup at the cSAXS beamline, PSI 
Switzerland. 

The third example (of [91-93, 104-108]) concentrates on the use of multidimen- 
sional X-ray spectroscopy for studying the dynamics in soft condensed matter, here 
the dynamics of ions in aqueous solutions (Fig. 17.15). Hydration shells around ions 
are crucial for many fundamental biological and chemical processes. Their local 
physicochemical properties are quite different from those of bulk water and hard to 
probe experimentally. 

By combining high-resolution soft X-ray spectroscopy using liquid jet technology 
as core hole clock spectroscopy method (Fig. 17.15 (left)) with molecular dynamics 


476 S. Techert et al. 


H,O 
4M MgCloaq) 


3a, 
1b, 


Norm. intensity [a.u.] 


518 520 522 524 526 528 
Emission Energy [eV] 


Fig. 17.15 References [90-108, 115, 116, 118, 121]: (Left) Principle of core-hole-clock spec- 
troscopy/resonant inelastic X-ray scattering for investigating soft condensed matter reaction dynam- 
ics. (Right) The difference spectrum between 4 M MgCl» and pure water is displayed. The difference 
represents the first hydration shell of water molecules around Mg’* on the time scale of the oxygen 
core-hole-clock (5 fs). The figure is adapted from [8-10] and references therein 


simulations and ab initio electronic structure calculations, at the molecular level the 
water-ion interaction in MgCl) solution has been elucidated. 

The results reveal that salt ions mainly affect the electronic properties of water 
molecules in close vicinity. Furthermore, in the first solvation shell the oxygen 
K-edge X-ray emission spectrum of water molecules differs significantly from that 
of bulk water. Ion-specific effects are identified by fingerprint features in the water 
X-ray emission spectra. While Mg”* ions cause a bathochromic shift of the water 
lone pair orbital, the 3 p orbital of the Cl” ions causes an additional peak in the water 
emission spectrum at around 528 eV (Fig. 17.15 (right)). 


17.9 Applications in Biophysics and Gas Phase 
Biomolecules 


Prolongating from the already presented techniques of combining X-ray scattering 
techniques like diffuse X-ray scattering with pressure jump, temperature jump, elec- 
tric field modulations, and structural freezing methods or, on the chemical modulation 
side, with, rapid mixing or photo-switching methods and their successful application 
to biophysical questions [90, 94-103, 115, 116, 118, 121] or the application of core 
hole clock and multidimensional X-ray spectroscopy techniques towards aqueous 
and pre-biotics questions [91-93, 104-108] or novel liquid jet developments for 
biophysical research [120], in X-ray spectroscopy various other coupling techniques 
have been successfully proven. 

The combination of synchrotrons or free-electron laser radiation with techniques, 
such as electrospray ionization mass spectrometry, allows deriving entirely novel 
experimental techniques for investigating macromolecules [109-121]. E.g. a mass 
spectrometric study on gas-phase ubiquitin at FLASH has revealed a fast local struc- 
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Fig. 17.16 References [109, 111-114, 119]: The mobile SMIS setup combines an electrospray 
ionization source, radio frequency (RF) mass filters, an RF trap and a mass spectrometer, which as 
a unity can be interfaced with different photon sources. The figure is adapted from [8-10, 114] and 
references therein 


tural response, leading to small fragments with yields increasing linearly with photon 
intensity [109, 119]. 

Investigating the interaction of light with biologically relevant molecules has 
gained interest for a wide variety of research fields including photochemical reac- 
tions such as light harvesting as well as radiation damage in proteins and DNA related 
to cutting-edge cancer treatment techniques. However, in the condensed phase, dis- 
entangling direct and indirect radiation effects is often difficult [109-121]. 

In the beginning, studies on isolated polyatomic molecules in the gas phase were 
limited to small molecules, which are stable against thermal decomposition. In order 
to advance to more complex biomolecular systems, a novel apparatus has been 
designed. This mobile setup combines an electrospray ionization source, radio fre- 
quency (RF) mass filters, an RF trap and a mass spectrometer, which as a unity can 
be interfaced with different photon sources (see Fig. 17.16). Electrospray ionization 
introduces biomolecular ions from solution into the gas phase, allowing for studies 
of molecular systems in a well-defined state [109, 111-114, 119]. 

The coupling of electrospray ionization sources with synchrotrons [109, 111-114] 
and free-electron lasers [119] opens the way to the investigation of the electronic 
structure of biomolecular systems and of a fine description of their relaxation mech- 
anisms in the VUV and soft X-ray energy range. The wide-ranging photon energy 
available at the synchrotrons enables systematic studies of ionization and dissocia- 
tion as a function of the photon energy. Inner-shell excitations provide a localized 
site of energy deposition. 

The extremely high photon flux and fs pulse duration offered by free-electron 
lasers allow studying the molecular properties in intense fields. Furthermore, using 
the assets of free-electron lasers in a pump-probe scheme enables the study of the 
dynamics of charge migration and charge transfer within gas-phase biomolecules. 
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In the VUV to soft X-ray regime, strong indications have been found that photoab- 
sorption within small peptides induces fast loss of aromatic amino acid side chains 
through repulsive states, occurring before redistribution of the internal energy. The 
very fast loss of side chains thus efficiently cools the residual peptide, and may enable 
the survival of early functional peptide substructures under photon irradiation. This 
dissociation is mainly caused by photoabsorption in the peptide backbone, which 
seems to trigger ultrafast charge migration to the side chains, without leading to 
fragmentation to the backbone itself [111-114, 119]. 

In the following, X-ray/peptide or X-ray/protein interactions have been studied 
as a function of peptide size. Small peptides like leucine enkephalin [109, 112] (five 
amino acid residues) dissociate into small immonium fragments, whereas bigger pro- 
teins like cytochrome c (106 amino acids residues) undergo only multiple ionizations 
and no fragmentation. The size dependency has been explained by considering that 
the average excess energy left in the peptide is redistributed over the ro-vibrational 
modes of the molecule and averaged over the various degrees of freedom. Therefore 
the larger proteins can handle much more energy due to more degrees of freedom to 
redistribute the energy. 

First studies of a gas-phase protein (ubiquitin, 76 residues) at the free-electron 
laser in Hamburg, FLASH, has revealed two different photoabsorption regimes: non- 
dissociative ionization in the few-photon regime (pulse energy of 0.1 uJ), in contrast 
to side-chain losses in the multi-photon regime (pulse energy of 2.3 uJ) [119]. The 
yields of these side-chain fragments increase linearly with the number of photons in 
the pulses. No region has been found where intermediate fragments due to backbone 
scission prevail. These effects suggest that in the XUV multiphoton regime, proteins 
seem to react as an ensemble of small peptides losing the side chains in fast local 
fragmentation processes. 

Near edge X-ray absorption fine structure spectroscopy (NEXAFS) probes tran- 
sitions between atomic core levels and orbitals of the molecular bonding states of 
intra-molecular neighbours. Therefore, NEXAFS is a powerful structural tool that 
provides information on the electronic structure. Data taken by near edge X-ray 
absorption mass spectrometry (NEXAMS) of gas-phase oligonucleotides [114], pep- 
tides [109, 112] (and references therein) and proteins [114c,118,119] show (*, o*) 
transitions and Rydberg states similar to conventional NEXAFS spectra of thin films 
and liquids. Additional structural and dissociation dynamics within the molecules 
can be extracted from the NEXAMS information of the different individual ioniza- 
tion and dissociation products. Moreover, the gas-phase NEXAMS method can be 
sensitive to the secondary structure of proteins as shown in the study of melittin [113]. 

In the photoexcitation case of the protein melittin, the molecules end up singly 
ionized after excitation of the electron and singly Auger emission. The yields of 
the singly ionized parent peak of different initial charge states of melittin against 
the photon energy around the carbon 1s edge are shown in Fig. 17.17 (top). In the 
photoionization case, the molecules end up doubly ionized after ionization of the 
electron into the continuum and singly Auger electron emission. The yields of the 
doubly ionized parent peak are shown in Fig. 17.17 (bottom). 
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Fig. 17.17 References [109, 111-114, 119]: Yields/NEXAMS of different initial charge states of 
melittin against the photon energy. (top) NEXAMS spectra of the singly ionized parent peak around 
the carbon 1s edge. (bottom) NEXAMS spectra of doubly ionized melittin after ionizations of the 
electron into the continuum and singly Auger electron emission 


A decrease of the resonance (1s — xčo) with increasing charge state is observed, 
an opposite trend as in the photoexcitation case. The NEXAMS properties and pho- 
toionisation behaviour are explained as additional ionization of the molecule by 
secondary electrons and the probability of the molecules being hit by these electrons 
depending on their geometries. Consequently, a helical and more compact structure 
with the charge state 2+ the molecule has a higher probability to be hit again by an 
escaping electron compared to linear structures with four charges. 
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Fig. 17.18 References [122, 122-127, 127-163]: FLASH/CFEL-ASG MultiPurpose (CAMP) 
endstation at the AMO beamline at the LCLS in 2012. Since 2014, CAMP is a permanent endsta- 
tion at FLASH/BLI at DESY in Hamburg. Various experimental techniques can be employed in 
the instrument, including novel time-resolved X-ray studies of chemical reactions in the gas phase. 
On the right side, CAMP’s double-sided velocity-map-imaging spectrometer is shown. The figure 
is adapted from the [8-10] and references therein 


17.10 Ultrafast Imaging of Unimolecular Gas-Phase 
Reactions 


The advent of X-ray free-electron lasers enabled not only novel studies in the con- 
densed phase, as described in the first parts of this chapter, but also brought forward 
unprecedented possibilities to study dynamical processes in the gas phase. The very 
short and very intense X-ray pulses made it possible, for the first time, to probe ultra- 
fast photo-induced molecular dynamics by electron or ion momentum spectroscopy 
following multiple, element-specific inner-shell absorption. 

These new 4th generation light sources also called for novel, dedicated instrumen- 
tation (Fig. 17.18). To this end, different endstations have been developed, initially in 
particular for the use at the atomic, molecular, and optical physics (AMO) beamline, 
which was the first beamline to become operational at the LCLS [122, 122-163] in 
2009. Several of the early, pioneering experiments from 2009 to 2012 have been per- 
formed at the AMO beamline in the CFEL-ASG MultiPurpose (CAMP) endstation 
[125, 127], see Fig. 17.18, which was developed within the Max-Planck Advanced 
Study Group (ASG) at the Center for Free-Electron Laser Science (CFEL) in 
Hamburg. Experiments in this instrument range from X-ray imaging of biomolecules, 
nanocrystals, and clusters [122, 125, 127-150] to (time-resolved) ion and electron 
spectroscopy on atoms and small gas-phase molecules [122, 122-127, 151-163]. 
Since 2014, CAMP is a permanent user endstation at FLASH/BL1 at DESY in Ham- 
burg [124], and its successor, LAMP, has become operational at the AMO beamline 
at the LCLS [16b]. 
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17.11 Applications in Nanoscience and 
Multiphoton-Ionisation 


Moreover, the high-field physics (HFP) instrument has been developed at the LCLS 
[128-150], housing an ion momentum spectrometer and several electron time-of- 
flight spectrometers mounted under different angles. It also offers a pulsed, super- 
sonic molecular beam, delivering gas-phase molecules to the interaction region. The 
examples given in the following, as well as multiple other pioneering gas-phase 
studies have been conducted in the HFP instrument [122-150]. 


17.12 Applications in Unimolecular Gas Phase Dynamics 


Gas-phase FEL experiments allow questioning fundamental definitions in chemistry, 
for example the investigation of processes beyond the Born-Oppenheimer approx- 
imation. As an example, a UV-pump, X-ray probe study of two complementary 
halomethane molecules with different photochemistry is shown here [122, 122-124, 
126, 151-163]. 

In Fig. 17.19, the schematic potential energy curves (PECs) of iodomethane 
(CH3]) and fluoromethane (CH3F) are displayed, illustrating that the different halo- 
gen species give rise to qualitatively different PECs [122, 152-155, 161, 163]. One 
reason for this is the considerably different electronegativity of iodine and fluorine, 
stabilizing the C-F bond in contrast to the C-I bond. Upon absorption of one 267 nm 
UV photon, CHsI dissociates into two neutral fragments, CH3 + I, whereas in CH3F, 
no PEC is resonantly accessible at 4.6 eV. In the latter case, absorption of at least 
three UV photons in the same molecule populates several higher-lying ionic PECs, 
also resulting in dissociation of the molecules. 

After a tunable time delay, an intense X-ray pulse (727 eV, 1 mJ) probes the 
dissociating system by ionizing predominantly the iodine (3d) or the fluorine (1s) 
level, respectively, because of their large absorption cross section (3.3 Mb for I and 
0.4 Mb for F, compared to 0.1 Mb for CH3), resulting in a localized positive charge 
on the halogen [122, 152-155, 161, 163]. 

At these very high X-ray intensities, a single molecule can absorb many photons, 
such that very highly charged ions up to I?!*/F* and C** are created. 

As the charge is initially created locally at the halogen atom though, the fact that 
highly charged carbon ions are also detected already shows that the charge rearranges 
within the molecule before or during the fragmentation. 

In Fig. 17.20, the calculated electrostatic potentials of an I°* + CH3 are plotted 
for three different internuclear distances between the two fragments, together with 
the binding energy of the highest occupied orbital. In the intact molecule (a), the 
electrons are delocalized, but as the fragments move apart the potential barrier rises, 
until at a certain critical distance (b), it reaches the electron binding energy. 
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Fig. 17.19 References [122, 122-127, 151-163]: Schematic potential energy curves for 
iodomethane (left) and fluoromethane (right). Absorption of one 267nm photon in CH3I leads 
to resonant population of a repulsive neutral state, whereas multi-photon UV absorption in CH3F 
populates to several higher-lying ionic states. After a given time delay, these states are probed by 
Coulomb explosion following inner-shell ionization of the respective halogen atom by one or sev- 
eral X-ray photons, probing the transition from a molecule to isolated atoms. The figure is adapted 
from the [8-10] and references therein 


Therefore, for larger distances (c) the electrons can classically be regarded as 
localized at one of the two fragments. It is this transition from a bound molecule to 
isolated atoms that is probed by time-resolved ion spectroscopy [122, 152-155, 161, 
163]. 

The delay-dependent time-of-flight peaks of selected ions of iodomethane and 
fluoromethane are shown in Fig. 17.21. It is evident that the fragmentation patterns 
of the two molecules are qualitatively different. For iodine charge states >I**, the 
appearance of low-energy ions at positive delays is clearly visible (channel 3 in 
Fig. 17.2 1a). 

Signatures of long-distance intramolecular electron transfer have been observed 
for both, CH3I and CH3F, and the reconstructed critical distances (up to 15 A for 
I?!) are in good agreement with a classical over-the-barrier model. 

These ions originate from the pump-probe process as indicated for iodomethane 
in Fig. 17.21, and can be used to extract the critical internuclear distance, up to which 
electron transfer from methyl to iodine is classically allowed for a given charge state 
[122, 152-155, 161, 163]. 

Two other channels can be seen in Fig. 17.21 that correspond to Coulomb explo- 
sion of intact molecules by only the FEL (1) and to ionic dissociation induced by 
multi-photon UV absorption (2), as illustrated for fluoromethane in Fig. 17.21, which 
also occurs with a lower probability in CH3I. The low-energy channel is absent in 
the fluorine ions. 
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Fig. 17.20 References [122, 122-127, 151-163]: Calculated Coulomb potentials for an 1% atom 
and a neutral methyl radical for a the equilibrium distance, b the cricital distance (see text) and e 
for isolated atoms (in a classical picture). The dashed blue line indicates the energy of the electron 
in the highest occupied orbital. The figure is adapted from the [8-10] and references therein 
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Fig. 17.21 References [122, 122-127, 151-163]: a-b Time-of-flight spectra as a function of the 
pump-probe delay for selected fragments of iodomethane and fluoromethane. Different fragmen- 
tation channels are indicated by 1, 2, 3 (see text). Additionally, in (c) and (d), calculated delay- 
dependent time-of-flight curves are overlaid with the data, corresponding to an asymptotic kinetic 
energy of 0.4eV. Positive delays correspond to the UV pulse arriving before the X-ray pulse. The 
figure is adapted from the [8—10] and references therein 
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17.13 Outlook and Conclusion: First High-Repetition 
Frequency, Ultrafast Hard and Soft X-ray Studies 
of Chemical Reactions at the European X-ray Free 
Electron Laser 


The current chapter tried to give a comprehensive summary of the very first and 
pioneering developments of chemical research tools utilizing soft and hard X-ray Free 
Electron Laser radiation. The structural dynamics and “molecular movie” concept 
has been introduced and confirmed in various states of matter and in different type 
of chemical reactions: in the gas phase (unimolecular), in the liquid phase (uni- and 
bimolecular) and in the solid state. 

Novel experimental FEL concepts have been introduced which needed to be devel- 
oped for the different chemical systems and reaction classes, and their pioneering 
applications in the various fields of FEL chemistry studies—from molecular reac- 
tion studies, through photocatalysis to energy research and biophysics—have been 
demonstrated. All reported experiments were of pioneer type concerning its kind. It 
has successfully been shown that it is possible to follow chemical reactions with FEL 
radiation in real time in the gas phase, in the liquid phase, and in the solid phase. 
Comparisons to other exiting techniques developed at free-electron laser sources 
have been given. 

As due to the coincidence between the time, where FELs in the soft and hard 
X-ray regime came first into operational mode (2005-2018), and the start and running 
time of the Collaborative Research Center SFB755 Nanoscale Photonic Imaging 
(2007-2019), FEL milestones for chemical research have directly and right-on-time 
been reported within the SFB755. From our side, therefore, the SFB covers the most 
precious “life-time” of our experimental efforts so far, making it most important, 
essential and special for us as intellectual exchange and scientific support platform, 
and as a common effort of the Göttingen Campus. Every further development being 
sketched in the last paragraph can be built on the scientific construction platform 
supported by the SFB. 

Since end 2017 the European X-ray Free-electron Laser as the first FEL of the 
2nd generation is operationable. It unifies MHz repetition frequency capabilities 
with highest brilliance, stable time-synchronisation (allowing for view femtosecond 
time-resolved experiments), highest degree of coherence, tunability etc. allowing 
for experiments in an evolution to pioneering experiments of Ist generation FELs 
but also towards novel experimental strategies employing multidimensional X-ray 
spectroscopy or ultrafast inelastic scattering techniques. In the “molecular movie” 
field first feasibility and demonstration experiments have been performed and are 
discussed in the context of existing X-ray FEL chemistry studies. In the next 10 years 
novel superconducting LINAC technologies will come in operation (i.e. in LCLS-ID) 
allowing for expanding existing techniques and a shift of experimental paradigm 
from proof-of-concept studies towards sample-curiosity driven X-ray experiments. 
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Chapter 18 A) 
Polarization-Sensitive Coherent geit 
Diffractive Imaging Using HHG 


Sergey Zayko, Ofer Kfir and Claus Ropers 


Abstract High harmonic generation (HHG) from lasers have attractive properties 
for probing ultrafast dynamics at the nanoscale. The spectral range of high harmonics 
at the extreme-UV and soft-X-rays (A ~ 100 nm-1 nm, fiw ~ 10 eV—1 keV) enables 
element specificity, the short wavelengths combined with high spatial coherence 
allows for imaging with nanometric spatial resolution, the extremely short pulse 
durations provide access to dynamics faster than a femtosecond (1 fs = 107! s), 
and all that, on a compact system. In this chapter, we focus on experimental aspects 
of imaging with high harmonics. First, we present the experimental system and 
the image reconstruction procedure. Second, we show experimental results from 
the various configurations that were used throughout this project. Finally, we discuss 
mechanisms that played an important role in this imaging effort, and would contribute 
to the advancement of nanoscale imaging. 


18.1 Experimental Setup 


The experimental setup for high harmonic generation is schematically depicted in 
Fig. 18.1. The laser system delivers up to 8 mJ per pulse (pulse duration of 40 fs) at 
1 kHz repetition rate, operating at a central wavelength of 800 nm. Initially, only a 
fraction of the output power of the laser system was used (between 0.4 and | mJ) 
for the generation of high harmonics in argon. Focusing the laser beam using a 
lens with a relatively short focal length of 20cm ensures that the field intensity is 
sufficient to drive this nonlinear process. A typical Ar pressure needed for efficient 
harmonic up-conversion is more than 30 mbar. However, already at this pressure, 
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Fig. 18.1 Experimental setup for lensless imaging with high-harmonic radiation. Femtosecond 
laser pulses are focused in a gas cell filled with Ar or He. Generated harmonics are spatially 
dispersed with a toroidal diffraction grating and refocused onto a sample. The resulting diffraction 
patterns are recorded using a charge-coupled device camera. The measured spectra from Ar and He 
are shown on the left and on the right, respectively 


most of the harmonics yield will be reabsorbed within a just a few mm. To mitigate 
the reabsorbtion of the generated radiation the interaction region is confined within 
a metallic capillary (diameter of 5-10 mm) with a very small entrance and exit 
holes that are self-drilled by the laser. Having only small holes is beneficiary, since 
it reduces the overall gas pressure in the generation chamber. Generally, the short 
absorption length of extreme-UV radiation requires that all experiments are carried 
out in high-vacuum conditions. After the HHG beam passes through an aluminum 
filter, a toroidal grating spatially disperses different harmonic orders, and refocuses 
a selected order onto the sample. When the sample is removed from the beam path 
all harmonic orders are incident on a charge-coupled device (CCD) camera and the 
HHG flux can be improved by finding the optimal phase matching conditions with a 
recursive fine-tuning of the following parameters: gas pressure, laser beam diameter, 
position of the capillary relative to the laser focus and the laser pulse chirp. 

The aluminum filter used in the experimental setup is a 150-nm-thick free-standing 
foil that separates the generation chamber and imaging system. It prevents oil con- 
tamination from the roughing pump ofthe generation chamber and blocks (by reflect- 
ing) any visible radiation, including the fundamental laser beam, from entering the 
imaging chamber. Depending on the thickness ofthe aluminum oxide layer, the trans- 
mission of the filter can be as high as 50% in the spectral range around 30 nm. The 
toroidal diffraction grating (550 grooves/mm, focal length 16 cm) spatially disperses 
the harmonics and refocuses them in its focal plane. In the used configuration the 
brightest harmonic order from Argon (23rd order, wavelength A = 34.8 nm) in the 
plateau region of the HHG spectrum is selected and isolated by the slit. The slit 
in front of the sample is used to reduce the unnecessary stray light in the imaging 
chamber. 

The sample is positioned in the focus of the harmonic beam, and the light scattered 
off the sample forms a diffraction pattern which is recorded downstream at distances 
ranging from 15 mm to 60 mm with a cooled back-illuminated CCD-camera (20 um 
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pixel size, 1340 x 1300 array). When the CCD is placed closer to the sample, the 
scattered light is acquired at higher numerical aperture (NA) resulting to potentially 
higher spatial resolution. On the other hand, for the reconstruction procedure to 
converge, the diffraction pattern must be sufficiently oversampled [1], which requires 
to place the CCD far enough. In practice, the distance is chosen in a tradeoff between 
these two requirements. 

The experimental results with quasi-binary mask samples are summarized in 
Fig. 18.2. The samples are prepared by focused ion beam (FIB) etching of sili- 
con nitride membranes coated with a gold layer of different thickness: 150 nm 
(Fig. 18.2c, f), 200nm (Fig. 18.2b, e) and 460 nm (Fig. 18.2a, d). The left column 
shows the measured diffraction patterns on a logarithmic scale and the correspond- 
ing SEM micrographs of the samples in the insets. When recording at short distances, 
central spots of diffraction patterns can be overexposed after just a few seconds of 
exposure. Thus, to record high scattering angles (carrying high spatial resolution 
information) with sufficient signal-to-noise ratio (SNR) and to increase the dynami- 
cal range of the data, several identical diffraction patterns (S—100) of the same sample 
were captured and averaged. The diffraction patterns are not centro-symmetric, indi- 
cating a non-trivial phase-structure of the exit wave. To obtain (reconstruct) real-space 
images of the samples from the corresponding far-field diffraction patterns methods 
for coherent diffractive imaging (CDI) were implemented. The approach retrieves 
scattering phases from measured diffraction intensities. Further general information 
on phase retrieval can be found in Chap. 6. The magnitudes of the CDI reconstruc- 
tions, i.e., the amplitude of light field distribution at the exit-surface of the sample 
are shown in the right column of Fig. 18.2 (in inverse gray colormap). 

To investigate the capabilities of lensless imaging with a high-harmonic source we 
designed various samples with different spatial features. These ranged from a heavily 
sparse objects (Fig. 18.2b, e) to a structure with a large open area (Fig. 18.2c, f). The 
diffraction pattern from the latter case requires an extremely high dynamic range 
since the central spot (mainly direct, unscattered beam) is very intense compared to 
the high-scattering angle components. This adds complexity to the data acquisition 
procedure, and typically requires a physical beam stop to block the intense center and 
consequent stitching of diffraction patterns captured with and without a beam stop. 
Furthermore, the phase retrieval process in the case of non-sparse object becomes 
rather challenging, as discussed next. 


18.2 Phase Retrieval of Experimental Data 


With the available detectors only far-field intensities can be recorded, while the 
phase information is lost. As discussed in Chap. 5, without this information one can 
back propagate the measured far-field data from reciprocal to real space using, e.g., 
Kirchhoff’s diffraction formula. Once the recorded far-field intensities are phased, 
the near-field information is linked by a Fourier transformation in the case of far- 
field diffraction. The missing-phase problem (see Chap. 5) can be solved with various 


504 S. Zayko et al. 


Fig. 18.2 Coherent diffractive imaging results using an illumination wavelength of 35nm for 
various samples. Left column a-c—the measured diffraction data. Right column d-f—CDI recon- 
structions from the corresponding diffraction pattern. The scale bars of the reconstructions are 1 ym, 
and the corresponding SEM images are placed as insets 


18 Polarization-Sensitive Coherent Diffractive Imaging Using HHG 505 


well-established reconstruction algorithms for iterative phase retrieval described in 
Chap. 6 and [2, 3]. However, in realistic experimental conditions, the diffraction 
images must first undergo post-processing procedures. Furthermore, the real-space 
support, which is the necessary a priori knowledge, has to be defined. In our scheme, 
we used the same procedure for every sample, irrespective of its shape, to post- 
process diffraction data. The process has several steps: first, the dark counts (signal 
emerging from the camera itself, irrespective of the illumination) were removed by 
recording and subtracting an image without an HHG beam, i.e., a dark image. If 
necessary, the dark image was subtracted from the measured data with an additional 
constant offset. Second, the center of mass of each data set was used to center the 
diffraction patterns. Finally, the images were mapped onto an equidistantly-spaced 
discrete Fourier plane, i.e., the Ewald sphere, to account for distortions from the 
use of a flat detector [4, 5]. This correction becomes important at high numerical 
apertures. 

To determine the support of the near-field, we start from the autocorrelation of the 
signal scattered form the object, that is, the Fourier transformation of the measured 
far-field intensities. A more precise support can be obtained by deconvolution of the 
autocorrelation function [6]. Depending on the shape of the sample, this deconvolved 
support can be sufficiently tight, and accurately define the transmissive parts of 
the object. Having a well-defined support drastically simplifies the phase-retrieval 
process. Generally, a subsequent refinement of the support can be achieved with 
methods such as “shrink wrap” [7] or by simply setting a magnitude threshold to the 
final reconstruction. 

To examine the phase retrieval performance and to find the most suitable recon- 
struction algorithm for the data obtained experimentally, we applied and tested mul- 
tiple algorithms: ER, DM, HIO, and RAAR, with some modifications for noise 
resistance [8, 9]. For HIO and RAAR with fixed relaxation 8 parameter we added 
additional constraints and an averaging procedure since these methods do not tend 
to stagnate [10]. The reconstruction process was done for 2000 steps, after initiation 
from a random guess for the far-field phase. We notice that HIO and RAAR per- 
formed significantly better than the other algorithms, whereas ER fails to converge 
for most of the experimental data. To find successful reconstructions within arun of 
2000 iterations for RAAR and HIO, the real space error (sum of counts outside the 
support) in every step is compared to the errors of the two preceding steps. Alterna- 
tively, one can calculate the far-field error by comparing the reconstructed far-field 
amplitudes with the measured data. If a local minimum of an error is found, the 
corresponding reconstruction is saved, with the purpose of keeping only ten recon- 
structions with the smallest error. Once all iterations are completed, the average of 
the 10 reconstructions corresponding to the 10 minima with lowest errors serve as 
the final reconstruction [10]. We note that averaging procedure was necessary only 
for data sets recorded with a relatively low SNR [11]. 

For the extended (autocorrelation-based) support, employing a positivity con- 
straint in real space and limiting phase variations to less than was required for 
consistent convergence of the phase retrieval process. We found it useful to recon- 
struct an image multiple times, as described above, where a successful reconstruction 
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provides for the tighter support for the next reconstruction procedure. This new sup- 
port is determined as the reconstructed amplitudes above a certain threshold. A tighter 
support accelerates and improves the convergence, so the positivity constraint can 
be relaxed, or removed completely. Alternatively, in the shrink-wrap method [7], the 
autocorrelation-based support is repeatedly redefined during the first reconstruction 
run by shrinking the support to include only regions above some threshold every given 
number of steps. This technique, however, requires a few more fine-tune parameters, 
especially for non-sparse objects. 

If the support is well-defined, we find RAAR (see Chap. 6, 6.25) to be the best 
method for reconstruction among the tested ones. Starting with a relaxation param- 
eter, 8 close to 1, we gradually reduced it down to 0.5 after the first few tens of 
steps. In this case, the algorithm converges consistently to a very similar or an iden- 
tical solution every time, making a multi-image averaging redundant (see PRTF in 
Fig. 18.11 of Sect. 18.6) [10]. 

It is important to note that the above procedure was performed on diffraction data 
with linear oversampling of 4 and even lower without the need for a higher oversam- 
pling ratio to handle noise [12]. Increasing the oversampling ratio by recording at a 
larger distance, and/or by using a CCD with a smaller pixel size, and/or by imaging 
smaller samples adds information redundancy to the diffracion data, and thus, CDI 
becomes more noise tolerant. For the phase-retrieval imaging of smaller structures 
at a given resolution (inverse numerical aperture) has two additional advantages: It 
reduces the coherence requirements of the source [13, 14] as well as the dynam- 
ical range of the scattered signal. The reduced dynamical range is achieved since 
a smaller portion of the beam remain un-scattered, thus, saturation effects of the 
central spot of the diffraction pattern are less severe. For these reasons, the phase 
retrieval process of a diffraction pattern from a smaller sample may converge to a 
reasonable solution for the given parameters such as wavelength, CCD pixel size 
and NA even when the diffraction data have low SNR or insufficient bandwidth. 
In this regard, ptychography [15] can be, in many cases, an efficient approach. In 
ptychography, multiple diffraction patterns of the same object are recorded—one for 
every shift of a confined illumination. A large real-space overlap between the illu- 
minated regions in each acquisition provides for additional redundancy in the data 
which eases the phase-retrieval process compared to a single diffraction pattern in 
CDI. Clearly this extra redundancy comes at a price of an increased exposure time, 
increased requirements for stability and positioning of sample relative to the beam. 

The oversampling ratio is a pre-requirement for CDI, but the form of a sam- 
ple may also drastically affect the phase-retrieval convergence. First of all, sparse 
objects have less data points with unknown values, i.e., number of pixels with val- 
ues to be determined within the real-space support. Furthermore, a well-defined 
sparse object provides for an accurate autocorrelation support—a crucial step for 
a successful reconstruction. This further reduces the number of unknowns due to a 
tighter support and “forbids” the reconstruction to move within the support. An uncer- 
tainty in the position of the reconstruction within the support will lead to a blurred 
reconstructed image, especially when non-stagnating algorithm is used together 
with an averaging over multiple reconstructions. In this regard, an object with a 
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cross-correlation term with a delta function (single pixel, or close to that) in the 
autocorrelation (inverse Fourier transformation of the measured far-field intensitites) 
gives a significant improvement for the phase retrieval. This feature is demonstrated 
in Sect. 18.5 with small holographic reference holes drilled in the vicinity of an 
investigated specimen. 

The parameters discussed above also affect the number of steps required for 
phase retrieval. For instance, in similar experimental conditions, a reconstruction 
based on RAAR algorithm was accurate after just over 50 steps, (Fig. 18.2b, e data 
in Sect. 18.5), or required hundreds to thousands of iterations (Fig. 18.2a, c, d, f). 


18.3 Experimental Results 


The reconstructions in Fig. 18.2 are in a good agreement with the SEM micrographs. 
However, further inspection of the experimental results reveals field and phase modu- 
lations that, at a first glance, may not be expected from a binary opaque transmission 
mask. Interestingly, such modulations have not been identified or reported in the 
literature even for a very similar experimental conditions. This might be because 
the achieved spatial resolution was not sufficient to accurately resolve such small 
features. In this case, the interpretation of the experimental results as well as an esti- 
mate for the achieved spatial resolution might be misleading. In the follwoing we 
show that the origin of these modulations is associated with the light propagation 
and multiple scattering within the objet (in this case opaque binary mask) itself. 

The reconstructed field—the field profile at the sample’s exit surface—telates 
to a product of the incident field profile and a non-scalar transmission function 
of the object. While SizN4 as well as gold are optically thick media at extreme- 
UV wavelengths, the light propagating through the removed regions of the sample 
can be represented as a sum of discrete propagating eigen modes, equivalent to 
the propagation of electromagnetic wave in a waveguide. These modes propagate 
through the sample to its exit surface, where they scatter and freely travel to the 
detector. Clearly, the exit-surface wave is a superposition of these propagating modes 
and the observed modulation in the reconstructed image resutls from multi-mode 
interference. In the following, we perform 2D and 3D numerical simulations using 
finite element modeling and semi-analytical solution to corroborate the experimental 
findings. 

Figure 18.3a shows a 2D numerical simulation of light propagation through slab 
waveguides of different width with geometries similar to the experimental conditions 
(marked with three solid lines L1—L3 in Fig. 18.2d). The material properties and 
wavelengths are as in the experiment as well. The field distributions at the exit 
surface of the waveguides are plotted with red dashed lines for numerically simulated 
data in the right column of Fig. 18.3a. The solid blue lines depict the experimental 
field values obtained from lineouts L1-L3 in CDI reconstruction. Similarly, the 
structure marked with a red dashed rectangle in Fig. 18.2f can be approximated as 2D 
slab waveguides of different width and the expected exit surface field distribution 
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Fig. 18.3 Interpretation of the experimental results: waveguiding at extreme-UV. a Numerical 
simulations of light propagation in slab waveguides for the wavelengths and materials used in the 
experiment. The waveguide dimensions correspond to the regions of the sample marked with L1, 
L2 and L3 in (a) and Cl, C2 and C3 in (c). Solid blue lines—measured data, red dashed lines— 
simulated data. b Three dimensional simulation of light propagation in rectangular waveguides of 
various sizes with our experimental conditions 
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can be simulated. The blue solid lines and the red dashed lines in the bottom of 
Fig. 18.3c are the experimental and simulated lineouts for the regions marked as 
C1, C2 and C3. For the structure shown in Fig. 18.3b, e, aspect ratios of individual 
features (waveguides) are not as high as in the case of the structures shown in Fig. 
18.2d, f. Therefore, here the approximation of the structure with a 2D model as a slab 
waveguide is not accurate and a 3D simulation is required. Figure 18.3b compares 
the expected exit surface fields (simulated using a 3D finite-element modeling) on 
the left with the experimentally measured ones on the right (from reconstruction 
shown in Fig. 18.2e) for two different waveguides. Again, as in the case with the 
other samples the reconstructed field distribution is in a close agreement with the 
simulated data. 

Further insights into waveguiding at extreme-UV frequencies and fundamental 
reasons for mode beating at the exit surface of CDI reconstructions follow from a 
semi-analytical solution of the mode propagation within the structure using eigen- 
mode expansion. Figure 18.4a demonstrates field distribution (field profile) of the 
first three eigenmodes in a gold-cladded slab waveguide. Figure 18.4b shows the 
computed transmission of these modes through a 700-nm-long waveguide as a func- 
tion of waveguide’s width. Here, TE and TM modes correspond to the polarizations 
parallel and perpendicular to the cladding, respectively. We note that only even order 
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Fig. 18.4 Analytical solution for waveguiding for interpretation of the experimental results shown 
in Fig. 18.3. a Field profiles of first three allowed eigenmodes in a symmetrical slab waveguide. 
b Mode transmission through 700-nm-long gold waveguide with perpendicular (TM) and parallel 
(TE) incident polarization 
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Fig. 18.5 Coherent diffractive imaging using illuminating wavelength of 30 (a, c) and 47 nm 
(b, d). The damping of higher order modes is already evident from the far-field diffraction patterns 
which is accurately reproduced in CDI reconstructions 


modes are supported by a symmetrical waveguide. As expected, higher order modes 
experience stronger damping in narrow waveguides. Thus, the relative intensities of 
these modes at the exit surface are governed by the waveguide dimensions and the 
intensity profile resulting from a superposition of these modes at the waveguide’s 
exit can strongly differ for very similar geometries, e.g., a slight width difference. 
Similarly, the illuminating wavelength affects the mode distribution at the exit 
plane. Figure 18.5 illustrates the experimental results of the same object imaged 
with a wavelength of 30 nm (a, c) and 47 nm (b, d). The zoomed regions (insets) 
emphasize the difference of the reconstructed fields. The image obtained with the 
longer wavelength contains dominantly the fundamental mode, whereas the image 
obtained with 30nm wavelength exhibit a complex mode-beating profile. Notably, 
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this feature is already evident from the corresponding diffraction patterns, where 
the maximum spatial frequency (spatial frequency with sufficient SNR) scattered off 
the sample at 47nm wavelength is much lower than the ones present on Fig. 18.5a, 
recorded with 30nm illuminating wavelength. High-order waveguiding modes for 
longer wavelength are strongly damped due to a multiple scattering within the object 
and do not pass through the structure to its exit surface and therefore absent (heavily 
suppressed) on the far-field image as well as on the reconstruction. Clearly, for 
such high aspect ratio structures or for the structures comparable to the illuminating 
wavelength the sharpness of the CDI reconstruction will be limited to the highest 
Eigen mode transmitted. Therefore, using a knife-edge technique without accounting 
for wave propagation effects in such structures might lead to a strong overestimation. 

We note, however, that a similar mode-beating profile can be observed when the 
high spatial frequencies are not fully recorded in the far-field, i.e., span beyond the 
CCD edges. Such a truncation of the far-field intensities determines the upper limit 
for the spatial resolution of the reconstructed image. 

To obtain high spatial resolution one needs to collect data at high NA. However, in 
most cases the signal-to-noise ratio at high scattering angles might be very weak and 
this will limit the maximum spatial frequency that can be accurately reconstructed 
with a phase retrieval algorithm and consequently the achieved spatial resolution. 
This is demonstrated in Fig. 18.6 where two diffraction patterns of the same struc- 
ture are recorded with the same wavelength (35 nm) and the same NA (0.5). The 
top row shows a single diffraction pattern recorded for 5 s and the corresponding 
reconstruction. The bottom row is an average of 200 individual exposures (HDR 
data), so the signal-to-noise ratio especially at high scattering angles (c.f. insets) is 
improved. The effect of higher SNR ratio also imprints to the corresponding CDI 
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Fig. 18.6 Coherent diffractive imaging with high dynamic range data demonstrates influence of 
SNR ratio at high scattering angles to the reconstruction quality. Note that fine features of the 
wavegude mode beating pattern cannot be resolved from the non-HDR (top) diffraction data with 
low SNR ratio. Adapted from [10] 
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reconstruction demonstrating a sharper image. While both reconstructions are in a 
good agreement with the SEM micrographs, the second CDI reconstruction from 
HDR data contains finer features that cannot be resolved in the first reconstruction. 
This is because the diffraction signal from smaller features scatters at higher angles 
where SNR is noticeably lower. Increasing the dynamic range by multi-exposure 
averaging results in a better SNR at high scattering angles and improves spatial reso- 
lution, although both images were recorded at the same NA. Clearly, poor quality of 
a diffraction pattern and/or non-accurate phase retrieval impede imaging with high 
resolution irrespective of how high the NA is. 


18.4 Polarization Dependence 


Figure 18.4b shows a simulation of wave propagation within a gold-cladded waveg- 
uide. As discussed in Sect. 18.3 for a relatively narrow waveguide (smaller than 
70 nm) higher-order modes can be fully damped whereas the fundamental mode 
may still be transmitted with relatively low losses. The narrower the slit the stronger 
the polarization anisotropy, i.e., transmission difference for polarization parallel (TE 
mode) and perpendicular (TM) to the walls of a waveguide. Higher suppression of 
the TM polarization can be explained by the fact that perpendicular fields pene- 
trate deeper into the gold cladding where it experiences an exponential decay. Inter- 
estingly, this polarization dependent transmission effect is opposite to the one in 
wire-grid polarizers where perpendicular polarization is transmitted. 

To investigate this phenomenon and its effects in nanoscale imaging with extreme- 
UV radiation, we designed and fabricated a structure with an angular arrangement of 
identical 50 nm-wide slits, etched in a gold coated Siz N4 membrane. The SEM image 
of the structure is shown in Fig. 18.7a. The diffraction pattern shown in (Fig. 18.7b) 
was recorded with S-polarized illumination at wavelength of 35 nm. The correspond- 
ing CDI reconstruction (exit-field intensity) is shown in Fig. 18.7c. The reconstruc- 
tion reveals that slits parallel to the field polarization appear noticeably brighter than 
the ones that are perpendicular to the electric field. Figure 18.7d plots the field inten- 
sity transmitted through each slit as a function of the angle between the slit orientation 
and the polarization, with comparison to Malus’ law for an imperfect polarizer. The 
reconstructed field (red circles) and the measured angular far-field (blue triangles) 
accurately follow the predicted pattern. The measurement was done for multiple 
linear-polarization states to verify that the polarization dependence originates solely 
from the sample, and that the slits are indeed identical. In contrast to far-field pattern 
where it is impossible to disentangle contributions from parallel slits, intensity infor- 
mation from the reconstructed image provide information for each slit individually. 
Based on the quantitative information from the CDI reconstruction and the simulated 
polarization dependence for such a structure, we estimated width of the slits to be 
52 nm. 

This experiment brought three insights to Lensless imaging with a high-harmonic 
source: 
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Fig. 18.7 Extreme-UV polarimetry. a An SEM image of a structure with nanoscale slits. The width 
of the slits is identical. b The diffraction pattern in logarithmic scale. c The reconstructed intensity 
at the exit surface of the sample obtained by CDI from (b). d An analysis of the experimental data 
showing polarization dependent transmission through the nanoscale slits. Experimental data from 
the CDI reconstruction and from the diffraction pattern (red circles, and blue triangles, respectively). 
The line is the Malus’ law for an imperfect polarizer 
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Fig. 18.8 Polarizer for Extreme-UV radiation. a An SEM image of the structure with nanoscale 
slits of a typical width of 40 nm. The structure demonstrates polarization dependent transmission 
with high extinction ratio T EJ TM mode 


1. Inthe extreme-UV and soft-X-ray range the scalar projection approximation is not 
valid. Instead, CDI can be used to accurately and quantitatively map polarization 
anisotropies and waveguiding effects at nanoscales. 

2. A structure with nanoscale slit arrangement provides information on the polariza- 
tion state of the incident extreme-UV light in a single acquisition measurement 
compared to a conventional reflection-based polarimeter, where incident polar- 
ization can be estimated only from a series of measurements at various angles 
[16]. This polarization analyzer shown in Fig. 18.7a proved to be very useful in the 
future experiments where optimization of polarization state of the extreme-UV 
light was required. 

3. A structure with an array of only parallel nanoscale slits can serve as an effec- 
tive polarizer for extreme-UV radiation as demonstrated in Fig. 18.8. Here, the 
diffraction patterns contain detectible scattering signal only from the polarization 
component parallel to the slits. The verification of the polarization anisotropy was 
done by rotating the sample and rotating the incident polarization. 
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18.5 Magneto-Optical Imaging Using High-Harmonic 
Radiation 


Recent developments in the generation of high harmonics with arbitrary polariza- 
tion [17, 18] allows to access X-ray magnetic circular dichroism (XMCD). The 
M-edge absorption lines in the important 3d-ferromagnets Fe (52 eV), Co (60 eV) 
and Ni (75 eV) are within the spectral range of a typical HHG source based on a 
Ti:Sapphire amplified laser [19, 20], and additional materials (e.g. Gd, 145 eV) can 
be accessed using HHG sources at the soft-X-ray [21]. The circularly polarized HHG 
are generated by a bi-chromatic laser field in a gas (typically He). The laser field com- 
bines the fundamental laser and its second harmonic X = 800 nm and X = 400 nm, 
respectively. The driving fields are circularly polarized, with opposing handedness, 
thus the selection rule for the high harmonics imposes the suppression of every third 
harmonic order, while the allowed harmonics are circularly polarized [17, 22]. In 
this project, we implemented two schemes for circularly polarized high harmonic 
generation, as described in [17, 18]. We find that the latter scheme, namely an in- 
line MAZEL-TOV (stands for MAch-ZEhnder-Less for Threefold Optical Virgina 
spiderwort) device, is robust and reliable, due to its inherent transmission geome- 
try. This device offers a drastically simplified alignment process compared to the 
scheme involving a Mach-Zehnder interferometer. The use of a single quarter-wave 
retarder to set the polarization of the bi-chromatic field enables a direct control on the 
laser polarization in the gas, and therefore, on the HHG polarization. For example, 
a reflection from a tilted surface may result in an unequal amplitude or phase for 
the incident TE or TM polarizations, thus deteriorating the laser’s degree of circular 
polarization. For the detection of XMCD contrast a direct polarization control is 
crucial—the helicity of the circularly polarized HHG is flipped from left-handed to 
right-handed by flipping the quarter-wave plate (QWP) retarder to 45° or to —45°. 
Furthermore, the MAZEL-TOV apparatus enables a straightforward fine-tuning of 
the recollision process in a way that allows to access to any harmonic order with 
circular polarization, even including the typically-suppressed harmonics orders (e.g. 
36, 39, ...) [23]. 

The experimental scheme for magnetic imaging with high harmonics is depicted 
in Fig. 18.9. The HHG source is based on the setup described in Sect. 18.3, with 
some modifications [20, 24]. First, Ar gas was replaced with He, which has a higher 
ionization potential and, thus, generates higher harmonic orders [25]. Specifically, 
we can access the 38th harmonic order in He and in Ne, providing for the optimal 
XMCD contrast at the M-edge of cobalt [26]. Helium was preferred over neon, 
since the price is significantly lower and the smaller absorption coefficient may 
results in brighter harmonics with a better beam quality. However the pulse energy 
(2-3.5 mJ) required to drive HHG process in He is higher than for Ar due to a 
higher ionization potential and higher phase matching pressures. Second, the need 
for circularly polarized HHG required the use of aMAZEL-TOV device. This device 
is inserted after the focusing lens to convert the linearly-polarized driving laser field 
into a bi-chromatic counter-rotating field. Finally, the interaction region, i.e., the 
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Fig. 18.9 Experimental setup for nanoscale magnetic imaging using high-harmonic radiation. 
a A scheme of the experiment, b A measured spectrum generated using a circularly polarized bi- 
chromatic field tailored with MAZEL-TOV device. The suppression of every third harmonic order 
(e.g. ..., 30, 33, 36, .. .) indicates that the harmonics are circularly polarized. The toroidal grating 
focuses the harmonics on the sample plane, and a slit selects the 38th harmonic order (59 eV), 
which provides for the optimal XMCD contrast in Co. To isolate the magneto-optical signal, two 
diffraction patterns are recorded—one with left- and one with right-handed circularly polarizated 
HHG beam. The sample includes the region of interest (central aperture) and four fully-drilled 
reference holes provide for a strong reference field that interferes in the far-field with the scattering 
from the magnetic sample 


focus of the fundamental beam, was positioned further away from the diffraction 
grating. This allowed us to illuminate a larger number of slits on the grating, which 
improves the spectral dispersion of the toroidal grating. Increasing the dispersion 
improved the temporal coherence of the beam in the sample region, and thus allows 
for a higher spatial resolution [1]. We note that, for previous experiments, described in 
Sect. 18.3 the estimated monochromaticity */ ax was larger than 500, corresponding 
to a spatial resolution down to ~30 nm [1, 12]. For the magneto-optical imaging 
experiments, the monochromaticity was increased to enable improved resolution— 
beyond 20 nm. In a more recent development, the sample was illuminated with a 
replica of the fundamental beam (A = 800 nm), through a controlled time delay 
with respect to the HHG beam. This pump-probe addition would allow to combine 
imaging experiments with femtosecond time resolution enabling movies of ultrafast 
magnetic dynamics at the nanoscale. 

In order to isolate a magnetic signal from non-magnetic background, two diffrac- 
tion patterns with opposite helicities were recorded (c.f. L and R in Figs. 18.9 and 
18.10 for the illumination with left- and right-handed circularly polarized 38th har- 
monic, respectively). The HHG helicity (L vs. R) is easily done by rotating the 
quarter wave-plate of the MAZEL-TOV device from 45° to —45°, and vice versa. 
For each helicity, the diffraction pattern dynamic range was increased by combin- 
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Fig. 18.10 Lensless imaging of nanoscale magnetic domains using high-harmonic radiation. 
a Holographic diffraction patterns recorded with left- (L) and right-handed (R) circularly polar- 
ized 38th harmonic order. b A full-field Fourier transform holography (FTH) reconstruction. 
c The magneto-optical absorption (top) and phase contrast (bottom) reconstructions as recovered 
with FTH. The spatial resolution is estimated between 150nm and 200 nm. d the magneto-optical 
absorption and phase contrast, as reconstructed via CDI. Using CDI, the spatial resolution reaches 
below 50 nm 


ing two exposure times. A long exposure—up to 10 min per helicity—provided the 
diffraction pattern for the medium and high scattering angles with sufficient SNR. 
A short exposure time (typically several images of a few seconds each) captured 
the low scattering angles, without any saturation effects. For example, diffraction 
patterns shown in Fig. 18.10a are composed of scattering data from an exposure time 
of 10min with the average of 24 frames of 5 s exposure each. The diffraction pat- 
terns were scaled properly and stitched to form a single diffraction pattern with high 
dynamic range. Finally, the diffraction patterns were prepared for reconstruction as 
described in Sect. 18.1. 

The phase retrieval process used for the magneto-optical images is similar to the 
one described in Sect. 18.2. As the real-space support, a single cross-correlation of the 
diffraction pattern was used. Since for magneto-optical experiments we used refer- 
ence holes, the cross-correlation includes accurate replicas of the structure, and thus 
dramatically improves the convergence of the phase retrieval algorithm. For samples 
with reference holes, the required number of RAAR steps can be reduced to about a 
100. The combination of FTH with CDI allows for a direct low-resolution but noise- 
tolerant reconstruction by a single-step Fourier transformation, and a high resolution 
CDI reconstruction. The experimental results from the magnetic structures are sum- 
marized in Fig. 18.10. First, two diffraction patterns from left- and right-handed cir- 
cularly polarized illumination are recorded (c.f. L and R in (a), shown in logarithmic 
scale). Second, a single Fourier transformation of the measured far-field intensities 
provides for the holographic reconstruction (Fourier transform holography—FTH). 
Finally, the real-space support is derived from the FTH, thus assisting the itera- 
tive phase retrieval to recover the high resolution image. Figure 18.10b shows the 
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magneto-optical amplitude for the holographic reconstructions. Since the sample had 
four reference holes, the FTH reconstructs eight replicas of the sample—a recon- 
struction and its complex conjugate for each reference hole. The magneto-optical 
amplitude (top) and phase (bottom) contrasts originating from the smallest reference 
hole (approx. 200 nm in diameter) are shown in Fig. 18.10c. The magnetic phase 
contrast is the phase difference between the reconstructions recorded with left- and 
right-handed HHG helicity. The corresponding magneto-optical contrast CDI recon- 
structions based on the RAAR algorithm are shown in Fig. 18.10d. Both the FTH and 
the CDI reconstructions show the same pattern of magnetic domains. The resolution 
of the CDI reconstruction is clearly higher (below 50 nm), since it is limited by the 
high NA recorded far-field, whereas, the resolution of the FTH reconstruction is set 
by the pre-drilled reference hole. Notably, the CDI reconstruction in this case has 
a multiple binary-like transitions between up and down magnetization, where the 
domain transition region is below a single-pixel in a vast region of the image. Since 
the domain-wall width for this Co/Pd multilayer structure is expected to be 10nm 
to 15 nm, it can provide for a test sample for higher resolution imaging, even below 
the illuminating wavelength (Asgın harmonic = 21 nm) to investigate the capabilities of 
HHG for magnetic imaging. 

In principle, the size of reference holes can be reduced in order to improve the 
spatial resolution of the Fourier-transform hologram. However, the intensity of the 
light transmitted through a small reference hole is significantly lower due to prop- 
agation effects in the narrow channel (see Sect. 18.3). As a result, longer exposure 
times would be required. Additionally, the manufacturing of narrow reference holes 
is a technical challenge, limiting the repeatability of the experiment. A convenient 
approach in FTH is to use of reference holes of slightly varying sizes. Thus, the 
achievable spatial resolution (hole size) and an image contrast (hole transmission) 
can be determined while analyzing the scattering data. In a later section we show 
that adding large and strongly scattering reference holes to the sample, assists the 
CDI reconstruction in few ways [20]. First, determining the support from the cross- 
correlation is much easier and the phase retrieval algorithm converges faster when 
reference holes are introduced. Second, the interference of weak scattering signals 
with a strong reference field enhances the weak signal so that it can be detected 
above the instrumental noise level [27, 28]. In contrast to FTH, the size of reference 
holes plays a secondary role in determination of resolution in CDI experiments. To 
demonstrate this, we designed a similar structure with reference holes larger than the 
sizes of the features to be observed. 

Figure 18.11 shows the imaging of worm-like domains in a Co/Pd multilayer stack 
in a field of view diameter of 4 um, for which the reference holes in the mask had 
a diameter of 500-600 nm. These diameters are twice larger than a typical size of 
the magnetic-domains and an order of magnitude larger than the final resolution 
for images obtained via CDI reconstructions. Figure 18.11a shows the diffraction 
pattern recorded with left-handed circular polarization and Fig. 18.11b shows the 
field magnitude of the corresponding CDI reconstruction. Note that the image is 
presented with a true-pixel resolution and contain features in the order of a single 
pixel. Figure 18.1 1c represents the phase contrast dichroic image, which is the angle 
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Fig. 18.11 Magnetic imaging using large reference holes. a Holographic diffraction pattern 
recorded with left-handed circularly polarized light. Inset highlights very good speckle visibil- 
ity at high scattering angles. b CDI reconstruction of (a) for a single helicity illumination. Image 
contains magnetic as well as non-magnetic contributions. e Magnetic phase-contrast CDI recon- 
struction, i.e., isolated XMCD signal. d PRTF for 20 reconstructions initiated from a random first 
guess indicating a consistent convergence for all spatial frequencies recorded. e Exit fields of the 
reference holes (interpolated) demonstrating strong intensity modulations due to waveguidng and 
Fourier-truncation effects 


of the ratio of two reconstructions recorded with opposite helicities. Despite the fact 
that reference holes are too large to resolve individual magnetic domains via FTH, 
CDI successfully retrieves high-resolution information by finding the far-field phase 
at high NA. To estimate the achieved spatial resolution, we use the phase retrieval 
transfer function (PRTF) [29] . Here, PRTF is calculated as the average far-field 
phase of 20 reconstructions that were initiated from a random first guess phase. 
Figure 18.1 1d depicts that the phase retrieval process is consistent throughout the 
entire far-field. Reconstructed phase reaching beyond 15 inverse um corresponds to 
a spatial resolution better than 33 nm. However, since the truncation of the diffraction 
patterns at the physical edges of the CCD begins just above 10 um, the spatial 
resolution that can be claimed is defined by this value to 50 nm (a single pixel 
resolution). 


18.6 Dichroic Imaging 


The extrme-UV light propagating in the reference holes undergoes the same waveg- 
uiding effects, as described in Sect. 18.3 (c.f. Fig. 18.11e). A similar effect is noted 
near the edges of the central aperture (c.f. Fig. 18.11d). Generally, wave effects in 
the illuminating field reduce the image quality and add artefacts [11] that are asso- 
ciated solely with an imaging scheme (a Fresnel number for the used wavelength 
and sample geometry). In contrast, dichroic imaging provides for a unique type of 
microscopy, which can eliminate these effects in the reconstructed image. When 
the magnetization map is obtained from the ratio of two reconstructed images with 
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Fig. 18.12 Waveguiding and edge-diffraction artifacts elimination with dichroic imaging. Left and 
middle images are absorption contrast reconstructions for left (L) and right (R)-handed circularly 
polarized illumination. Wave modulations from edge diffraction effects are observed and can not 
be separated from magnetic signal. Right image—a dichroic image—the ratio of L/R isolates the 
magnetic signal from non-magnetic contributions 


opposite helicities (L and R helicities), the artifacts vanish. The dichroic contrast 
isolates the magnetic (dichroic) signal from the non-magnetic background. 

Figure 18.12 shows a part (left lower quarter) of the central aperture for left- and 
for right-handed circularly polarized illumination marked as L and R. The image for 
each helicity exhibits fine intensity modulations in the order of a single pixel, mainly 
near the edge of the aperture. These fine modulations are real but cancel each other 
out in the dichroic image (Right image in Fig. 18.12 for the ratio L/R), provided 
that reconstructions from opposite helicities are accurately overlaid. To consistently 
position the two reconstructions with a sub-pixel accuracy, it is convenient to match 
the lower order moments of their far-field phase (i.e. the global phase, and the phase 
gradients). To do so, we fully reconstruct the data recorded for one helicity, and use 
its far-field phase as an initial guess for the phase retrieval of the opposite helicity 
diffraction pattern. Since the dichroic component is only a small component of the 
far-field, only a few iterations are required for a full phase retrieval, when starting 
from the far-field phase of the opposing helicity. This provides for a robust and 
reliable imaging of magnetic samples without any artifacts of smearing effects from 
reconstruction shifts and waveguiding effects. 


18.7 Signal Enhancement Mechanism 


A strong reference or an auxiliary wave in the vicinity of the field-of-view has an 
advantage for recording diffraction data. Specifically, a strong scattering field from 
multiple reference holes interferes on the CCD detector with the weak magneto- 
optical scattering signal. Thus, photons carrying magnetic information can be 
detected above the instrumental noise, and the exposure times can be drastically 
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Fig. 18.13 Signal enhancement effect through the interference with a strong reference wave. (left) 
A diffraction pattern from a sample with very narrow reference holes (sub-100 nm). (right) A 
diffraction pattern from the same sample with large reference holes (500-600 nm). The larger 
reference holes result in a high signal-to-noise ratio at higher scattering angles. The weak magneto- 
optical signal scales as square-root of the strong reference intensity, because it is an interference 
effect. Since the reference holes allow for the auxiliary field to cover the entire CCD, the weak signal 
is amplified above the noise throughout the recorded diffraction pattern, and a high-resolution image 
is reconstructable 


reduced. Figure 18.13 shows a comparison of two diffraction patterns recorded from 
the same structure, albeit with a different intensity of the reference field. When the 
reference holes were small (left), the signal includes mostly the low-angle scattering 
from the central aperture, where the high resolution information is buried and lost 
in the noise. When the reference holes are large (right), a meaningful portion of the 
light passes them and scatters to higher angles, thus lifting the signal above the level 
of the instrumental noise of the camera. Thus, this auxiliary field brings the weak 
magneto-optical scattering above the noise through interference. In this example, the 
scattering from the waveguides is two orders of magnitude higher when the reference 
holes are large, which means that through interference, the magneto-optical signal 
is enhanced by at almost one order of magnitude. 
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Chapter 19 A) 
Nonlinear Light Generation in Localized u 
Fields Using Gases and Tailored Solids 


Murat Sivis and Claus Ropers 


Abstract In Chap. 18, we demonstrated polarization-sensitive imaging at extreme- 
ultraviolet (EUV) wavelengths using a gas-phase high-harmonic generation (HHG) 
source. In a related project, we have investigated new types of gas-phase and solid- 
state EUV light sources employing field localization in plasmonic nanostructures 
and structured targets. Whereas our first results indicate that strong field confine- 
ment leads to exceedingly inefficient high-harmonic generation in gas-phase tar- 
gets, for solid-state media efficient high-harmonic generation is possible in localized 
fields. The latter has great ramifications for new types of high-harmonic generation 
experiments and technological developments. Therefore, our research efforts aim in 
two directions: firstly, the development of new types of solid-state sources for high- 
harmonic generation and, secondly, the application of locally generated solid-state 
high-harmonic signals for spectroscopy and imaging. 


PACS Subject Classification: 42.65.Ky - 81.07.-b - 73.20.Mf 


19.1 Plasmonic Enhancement for EUV Light Generation 


In this section, we discuss EUV light generation in gas-phase and solid targets using 
strong laser fields confined in plasmonic nanostructures. In the case of gas-phase 
targets, we critically revisit the feasibility of high-harmonic generation in resonant 
nanoantennas and tapered hollow waveguides (see Fig. 19. 1a and b), as reported pre- 
viously [1, 2]. The results of our studies [3-5] show that for gas-phase targets, such as 
noble gas atoms, the measurable EUV emission exclusively originates from incoher- 
ent fluorescence instead of coherent high-harmonic radiation, which is due to an unfa- 
vorable conversion efficiency for coherent signals generated in low-density targets 
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Fig. 19.1 Plasmon-enhanced EUV light generation. a Scanning electron micrograph of an array of 
gold bow-tie nanoantennas on a sapphire substrate. Lower inset: Close-up of two bow-tie antenna 
pairs. Upper inset: Schematic illustration of the gas excitation in the enhanced fields localized in the 
gap between the tips of the triangular antenna arms. b Scanning electron micrograph of a gold plateau 
containing a tapered hollow waveguide structure. Lower inset: Close-up of a waveguide, taken from 
the entrance aperture side. The waveguide has an entrance aperture size of several micrometers and 
tapers down to an exit aperture of few tens of nanometers in size. Upper inset: Schematic illustration 
of gas excitation in the locally enhanced field near the waveguide’s exit aperture. c EUV spectra 
from xenon (Xe), argon (Ar) and neon (Ne), excited in the waveguide structure shown in (b). The 
argon and neon spectra are up-shifted to avoid overlap. Similar spectra have been observed from 
xenon and argon gas using bow-tie antennas for field enhancement 


in nanometrically confined excitation volumes. After briefly describing our experi- 
ments and discussing the implications of the outcome, we highlight recent experi- 
ments, which overcome the limitations for plasmon-enhanced HHG by exchanging 
the gas-phase targets with solid-state materials. 

In the experiment, the nanostructures are illuminated under vacuum conditions 
(pressure below 10~° mbar) with low-energy, few-femtosecond laser pulses centered 
at 800 nm wavelength from a 78 MHz Ti:sapphire oscillator. The incident laser field is 
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confined in the particular structure leading to local intensities which are enhanced by 
more than two orders of magnitude compared to the incident intensity. Injected noble 
gas atoms (backing pressures up to 500 mbar) are excited and ionized in the enhanced 
field (see upper insets in Fig. 19.la and b) via multiphoton absorption and strong- 
field excitation. The radiation emitted by the atoms is collected with an EUV flat- 
field spectrometer, which spectrally resolves the signal on an imaging microchannel 
plate detector. For both excitation schemes, the laser radiation is tightly focused to 
achieve incident intensities in excess of 0.1 TW/cm?. More detailed information on 
the experimental setup and conditions can be found elsewhere [3, 5]. 

Figure 19.1c shows a set of EUV fluorescence spectra obtained from different 
noble gases excited in a hollow waveguide structure, where all spectral features are 
identified with emission corresponding to electronic transitions of the respective gas 
species. The wavelength positions of the fluorescence lines are marked with full and 
open triangles for emission from neutral and singly ionized atoms, respectively. Sim- 
ilar spectra were measured using bow-tie nanostructures. The reason for the lack of 
coherent emission in these spectra can be found by considering the absolute conver- 
sion efficiency of the HHG process in the localized generation volume. The number 
of gas atoms in the generation volume is too small for a very efficient coherent 
build-up of the harmonic radiation. More specifically, the output power of (phase- 
matched) high-harmonic generation scales quadratically with the pressure-length 
product, which is drastically reduced in implementations based on surface-plasmon 
field confinement in nanostructures. Since incoherent fluorescence scales only lin- 
early with the pressure-length product, it can be efficiently generated and therefore 
is the predominant contribution to the EUV signal. 

Estimates suggest that high-harmonic signal levels in plasmon-enhanced scenarios 
should be several orders of magnitude below that of the simultaneously emitted 
incoherent fluorescence [5, 6]. Therefore, if possible at all, it will be very difficult 
to discriminate coherent from incoherent signals. 

In a recent publication, the authors of the first studies on plasmon-enhanced gas- 
phase high-harmonic generation [1, 2] report that a reproduction of their initial 
experiment resulted in the exclusive observation of incoherent emission and they 
acknowledge that their interpretation of the previous results in [1] was not fully 
correct, since they ignored the possibility of any incoherent contributions [7]. 

As described above, plasmon-enhanced high-harmonic generation in gaseous 
media is unfeasible under the given experimental conditions. However, the main 
limitation—the too low gas atom density in the generation volume—can be overcome 
by exchanging the gas with condensed matter targets. Reports on plasmon-enhanced 
high-harmonic generation in sapphire [5, 7], silicon [8], and zinc-oxide [9] crystals 
re-initiate the experimental research on high-harmonic generation in localized fields 
and, moreover, extend the prospects for solid-state high-harmonic generation. 
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19.2 High-Harmonic Generation and Imaging in Tailored 
Semiconductors 


Beyond enabling high harmonic generation from localized fields in plasmonic 
nanostructures, solids have an important advantage over gaseous targets: The struc- 
ture and chemical composition of solid targets can be tailored via established micro- 
and nano-engineering methods. This enables novel means to shape the generated 
high-harmonic wave field in terms of intensity, phase or polarization. In turn, coher- 
ent signals generated in structured solids can be used for imaging and spectroscopy 
with nanometer spatial resolution. 

Here, we use locally structured and chemically modified zinc oxide (ZnO) and sil- 
icon targets to demonstrate high-harmonic generation and imaging [10]. Figure 19.2a 
depicts the experimental setup and principle. A zinc oxide crystal with an array of 
microcones milled to its surface (focused ion beam fabrication, see scanning electron 
micrograph in Fig. 19.2c) is placed in the focus of an infrared (2.1 um central wave- 
length), 70-femtosecond laser beam with 10kHz pulse repetition rate. At incident 
peak intensity exceeding 30 GW cm, high-harmonic radiation up to the 9th har- 
monic order of the fundamental frequency (see Fig. 19.2b) is generated in the target. 
The high-harmonic generation is localized to sub-wavelength sized regions at the 
apexes of the cones due to a concentration of the incident laser light by total internal 
reflection and interference, leading to an at least 10-fold intensity enhancement. The 
localized emission leads to a 2-dimensional diffraction pattern in the far-field, which 
is shown for the 3rd (red spots) and 5th (blue) harmonic far-field intensity in the inset 
in Fig. 19.2a. 

At this point, we employ two approaches to get information on the intensity 
distribution at the sample plane. Since the wavelength of the 3rd and 5th harmonics 
are in the visible range, direct imaging using a high-numerical-aperture objective in 
combination with a bandpass filter and a charge-coupled device camera is possible. 
Figure 19.2d shows the directly imaged 5th harmonic pattern at the sample plane. 

In order to illustrate that such an imaging approach is also transferable to har- 
monics with higher photon energies, where refractive optics are not available, we 
reconstruct the exit amplitude from the far-field intensity by applying a phase retrieval 
algorithm as described in Chap. 18. The reconstructed amplitudes are shown in 
Fig. 19.2e. The field of view of the reconstruction is reduced due to limited spa- 
tial coherence and low signal-to-noise ratio of the diffraction pattern. However, the 
central spots clearly register to the arrangement of the cones in Fig. 19.2c, indicating 
the possibilities to image the nanoscale structures in solids. 

In addition to structural modifications, which affect the driving laser field, local 
chemical changes also lead to enhanced high-harmonic generation in solid targets, 
as shown in Fig. 19.3 for the example of a gallium-ion-implanted silicon target. The 
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Fig. 19.2 High-harmonic generation in ZnO structures. a Schematic illustration of the experiment. 
The far-field intensity distribution, which is recorded with a 3-color CMOS sensor, shows diffracted 
third- (red) and fifth- (blue) harmonic signals emitted from a cone grating (see SEM image in 
c). The upper inset shows a 3-dimensional illustration of the microcones on the crystal’s surface. 
b High-harmonic spectrum from a structured target illuminated with amplified 2.1-j1m laser pulses. 
The harmonic orders are labeled H3-H9. Harmonics H7 and H9 were measured with a different 
spectrometer exhibiting a higher sensitivity in the spectral range below 350 nm. e Scanning electron 
micrograph of a 2D-grating of microcones on ZnO. d, e Comparison of the directly imaged fifth 
harmonic signal (Direct) using an objective and a phase-retrieval reconstruction of the exit amplitude 
(Reconstructed) using the far-field intensity 
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Fig. 19.3 Fresnel zone plate high harmonic source in silicon. a Scanning electron micrograph 
of a gallium-implanted Fresnel zone plate pattern (darker region) on a silicon wafer. b Intensity 
distribution of the fifth harmonic signal (H5, compare spectrum in Fig. 19.2b) at the sample plane. 
c Focus scan of imaging objective showing the azimuthally-integrated fifth-harmonic signal as a 
function of distance to the sample plane (z = 0) 


scanning electron microscope image in Fig. 19.3a shows the Fresnel zone plate pat- 
tern written into the silicon surface (dark regions were exposed to low-dose gallium 
ions). The gallium ions create defects in the silicon matrix, which lead to a local 
enhancement of the high-harmonic generation process, most likely by inducting 
mid-gap state in the silicon band structure. Figure 19.3b is an image of the fifth- 
harmonic emission at the sample plane, recorded with the objective lens, showing 
an enhanced signal in the gallium-implanted regions. 

Generally, imaging of high-harmonic signals from locally structured and chemi- 
cally modified solid targets represents a novel means to investigate local strong-field 
phenomena in condensed matter systems at nanometer scales, with potential capa- 
bilities for temporally or spectroscopically resolved studies. In turn, such sources 
also allow for the control of the generated high-harmonic wave field, as the Fresnel 
zone target in Fig. 19.3 illustrates. 

The source leads to a focusing of the generated harmonic radiation to diffraction 
limited spot sizes. Figure 19.3c shows a focus scan of the azimuthally integrated 
intensities at different distances to the sample plane along the optical axis. Further 
results show that also the phase and polarization of the generated emission can be 
modified in such schemes. 

In conclusion, our study demonstrates that strong-field effects are generally 
excitable in localized laser fields, e.g. by employing plasmonic nanostructures for 
field enhancement. However, gas-phase high harmonic generation lacks conversion 
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efficiency in such scenarios due to a small number of gas atoms in the localized gen- 
eration volume and the resulting insufficient coherent build-up of the high-harmonic 
signal. Instead of coherent high-harmonic emission, we observed bright incoher- 
ent extreme-ultraviolet fluorescence, which stems from multiphoton and strong-field 
excited and ionized gas atoms. 

In contrast to gas-phase targets, in solid media an efficient generation of high har- 
monics in localized fields is possible. In structured ZnO and silicon targets we concen- 
trated the driving laser field in microscopic cones and wedges and observed enhanced 
coherent high-harmonic emission from sub-wavelength sized generation volumes. 
In a second approach, we demonstrated that chemical modifications also influence 
the high-harmonic generation in solids. Direct imaging or phase-retrieval reconstruc- 
tion of the localized coherent emission will enable new means to study strong-field 
phenomena in solid-state systems. Additionally, tailored solids will enable the devel- 
opment of new types of sources for tailored high-harmonic wave fields. 
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Chapter 20 A) 
Wavefront and Coherence PES 
Characteristics of Extreme UV and Soft 

X-ray Sources 


Bernd Schäfer, Bernhard Flöter, Tobias Mey and Klaus Mann 


Abstract The first part of this chapter comprises setups and results of the determi- 
nation of wavefront and beam parameters for different EUV sources (free-electron 
lasers, HHG-sources, synchrotron radiation) by self supporting Hartmann-Sensors. 
We present here i.a. a sensor applied for alignment of the ellipsodial mirror at FLASH 
beamline 2, yielding a reduction of the rms-wavefront aberrations by more than a 
factor of 3. In the second part we report on the characterization of the Free-Electron- 
Laser FLASH at DESY by a quantitative determination of the Wigner distribution 
function. The setup, comprising an ellipsodial mirror and a moveable extreme UV 
sensitive CCD detector, enables the mapping of two-dimensional phase space corre- 
sponding to the horizontal and vertical coordinate axes, respectively. Furthermore, 
an extended setup utilizing a torodial mirror for complete 4D-Wigner reconstruction 
has been accomplished and tested using radiation from a multimode Nd: VO4 laser. 


PACS Subject Classification: 42.15.Dp - 42.25.Kb + 42.55.Vc 


20.1 Introduction 


Electromagnetic radiation in the extreme UV and soft X-ray spectral range is of 
steadily increasing importance in fundamental research and industrial applications. 
For instance, the molecular structure of proteins and viruses has become accessible 
by coherent diffractive imaging techniques; currently, lithographic processes for the 
microchip production are being adapted to the extreme UV wavelength of 13.5 nm. 
For both examples, a comprehensive beam characterization is an essential condition 
for an ideal use of the available photons, and only exact knowledge of the illu- 
minating radiation field allows for further improvements of spatial resolution and 
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reliability in nanoscale imaging and structuring. To this end, pioneering develop- 
ments in large-scale and table-top light sources of extreme ultraviolet (EUV) radia- 
tion are necessarily complemented by implementing advanced beam characterization 
techniques. The Laser-Laboratorium Göttingen has developed metrological tools and 
analysis procedures for proper characterization of the propagation behaviour of short 
wavelength radiation. This contribution addresses wavefront measurements on free 
electron lasers (FELs) and high harmonic (HHG) sources emitting in the extreme UV 
and soft X-ray range. The diagnostics schemes based on Hartmann sensing accom- 
plish, on the one hand, comprehensive beam analysis including prediction of focal 
distributions, on the other also fine-adjustment of beamline optics for optimization 
of peak intensities. Additionally, the coherence of laser beams is analyzed by mea- 
surements of the Wigner distribution function. This method is applied to the photon 
beam of the free-electron laser FLASH, resulting in the entire characterization of its 
propagation properties, including both global and local degrees of spatial coherence. 


20.2 Wavefront Metrology and Beam Characterization 
with Hartmann Sensors 


20.2.1 Hartmann Wavefront Sensing 


The wavefront or phase distribution of a radiation field carries quantitative informa- 
tion over its directional distribution, and is therefore of utmost importance for the 
design of beam transport optics. On-line recording of the wavefront can also accom- 
plish an optimization of the beam focusability by precision alignment of optical 
elements. Other relevant areas are the monitoring and possible reduction of thermal 
lensing effects, on-line resonator adjustment, or “at wavelength” testing of optics 
including Zernike analysis. The wavefront of a radiation source is defined as the 
surface w(x, y) that is normal to the local direction of energy propagation in the 
electromagnetic field [1], i.e. normal to the Poynting vector (x, y) at the measure- 
ment plane (cf. Fig. 20.1, left). In case of highly coherent radiation, w(x, y) is a 
surface of constant phase. The phase distribution ® (x, y) is then related to the wave- 
front according to 


20 
(x,y) = ` - w(x, y), (20.1) 


where X is the mean wavelength of the light. 

A variety of different techniques has been developed for wavefront sensing. Inter- 
ferometric devices, as there are Twyman-Green, common path, lateral shear, Mach- 
Zehnder or Sagnac interferometers, can be applied over the full wavelength spectrum 
for which detectors and optical materials are available, provided that the coherence is 
sufficient for detectable levels of interference. Alternatively, phase gradient measure- 
ment techniques, in particular Hartmann or Hartmann-Shack, can be used with both 
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Fig. 20.1 left: Definition of the wavefront of a radiation field; right: Measurement principle of 
Hartmann-type wavefront sensors (cf. text) 


coherent and incoherent beams. In these instruments, the gradients of either wave- 
front or phase are measured, from which the two-dimensional phase distribution can 
be reconstructed. The Hartmann principle [1] is based on a subdivision of a beam into 
a number of beamlets (see Fig. 20.1, right). This is either accomplished by an opaque 
screen with pinholes placed on a regular grid (Hartmann sensor), or by a lenslet or 
micro-lens array (Hartmann-Shack sensor). The latter accomplishes a better radiation 
collection efficiency and a wider dynamic range. For this reason, Hartmann-Shack 
sensors are already widely used for Vis, NIR and UV radiation. However, since no 
transmissive optical materials for the fabrication of micro-lens arrays are available at 
extreme UV and soft X-ray wavelengths, only the Hartmann appoach using pinhole 
arrays is appropriate in this spectral region. The spot distribution produced by the 
segmenting array is recorded at a distance | by a position sensitive detector, most 
commonly a CCD camera. The position of the beamlet centroids is determined within 
each sub-aperture, both for the beam under test and a reference wavefront. The lat- 
ter is provided preferably by a well collimated laser beam (plane wave), or a well 
defined spherical wave, using e.g. the output of amonomode fiber or the Airy pattern 
produced behind a diffracting pin-hole. The displacement of the spot centroid Ax 
divided by the distance / yields the local wavefront gradient (3, inside one subaper- 
ture relative to the reference wavefront (see Fig.20.1, right). By direct integration 
or modal fitting techniques using Zernike or Legendre polynomials, the wavefront 
w(x, y) is reconstructed from these local gradients [2, 3] and afterwards corrected 
for tip/tilt and defocus [1]. A detailed description of the wavefront reconstruction 
methods is given in the references. The main advantages of the Hartmann technique 
compared to interferometric devices are 


e suitability for fully and partially coherent beams, 

e no requirement of spectral purity, 

e no ambiguity with respect to 27 increment in phase angle, 

e compact and robust design. 

Hartmann-Shack and Hartmann wavefront sensors can be successfully applied for 
real-time laser beam characterization, since they are recording simultaneously (i.e. in 
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single pulses) the wavefront (directional distribution) w(x, y) and the beam profile 
or intensity distribution /(x, y) of a radiation field [4, 5]. The latter is obtained by 
summation over pixel data inside the individual subapertures, at a reduced spatial 
resolution given by the pitch of the segmenting array. As has been demonstrated for 
visible laser radiation, in case of coherent sources the knowledge of beam profile 
I(x, y) and wavefront w(x, y) allows for calculation of the relevant beam parameters 
[5-7]. For this purpose the moments method described in [5, 7, 8] is applied: The 
central second spatial (x, y) and angular (u, v) moments are computed from the 
intensity distribution and the local wavefront slopes (,,, according to 


Di, Oy - (x) Lj 


(x2) = = hy (20.2) 
i,j fü 
ij xij x ij Ii; 
(xu) = an < = = (20.3) 
i,j fü 
(0,1)? 
2) = Xij Gig — (Be) Ly ( À j Dij ( I ); (20.4) 
Dij Lij 2T 4 Dij Lij ' 


where (x) and (8x) are the first moments over x and (3, [7], respectively; the index (ij) 
denotes the subaperture. From the second moments the beam width d, divergence 0, 
beam propagation factor M?, beam waist diameter dg, waist position zy and Rayleigh 
length zz are computed according to the following equations [7]: 


d = 4y (x?), 6 = 4y (u?) (20.5) 
2 
M? = 4r (x2) (u2) — (xu)? dg = na (20.6) 
A TO 
u zr(xu) dN? u dp 
z= eon (<) 1 ZR = y (20.7) 


Moreover, once the intensity and the phase distributions are known from a Hart- 
mann measurement, solving Fresnel-Kirchhoff’s integral allows numerical propaga- 
tion of the beam, i.e. computation of intensity distributions at different propagation 
distances z [9]: 


ik i vn ikahon 
I(x, y,z) = =S VI, yee CL) wae meen dx’ dy’ 
lone) 


Here x, y and x’, y’ are the Cartesian coordinates in two coplanar planes separated 
by z. Thus, in particular the profile at the beam waist position can be predicted, which 
is in many cases hardly accessable for high power lasers, both due to the high intensity 
and the small size of the focal spot. 
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20.2.2 EUV Wavefront Sensor for FEL Characterization 


Since currently operating (soft) X-ray FELs are based on the self-amplified sponta- 
neous emission (SASE) process which builds up the laser emission from noise, their 
beam characteristics can signigicantly differ from pulse to pulse. Therefore, there is 
a strong requirement for single-pulse photon diagnostics and online characterization 
of the FEL beam propagation parameters [10, 11]. For this reason a Hartmann wave- 
front sensor for the extreme UV spectral range was developed and applied for photon 
diagnostics, beam propagation and optics alignment of the FLASH free electron laser 
in cooperation with DESY Photon Science/Hamburg [12]. The device was designed 
to operate from 4 to 40 nm, which is within the accessible FLASH wavelength range. 
It consists of a pinhole array (Hartmann plate) made of a 20 um-thick nickel foil 
with orthogonally arranged electroformed holes (dia. 75 um, pitch 250 um) in front 
of a CCD camera at a distance of 200mm behind the array (see Fig. 20.2). This 
distance as well as the dimensions of the Hartmann plate represent a compromise 
between attainable wavefront sensitivity at short wavelengths and spatial resolution 
at long wavelengths. For converting the soft X-rays into visible light the CCD chip is 
coated with a fluorescent coating (Gd202S:Tb, emission wavelength 545 nm). The 
Hartmann sensor is adjustable both laterally and with respect to tip and tilt. The 
device is self-supporting and compact (240mm x 240mm x 300mm) and can be 
attached behind user experiments. 

For absolute at-wavelength calibration of the Hartmann sensor a proper reference 
wavefront independent of the mentioned pulse-to-pulse fluctuations is essential. For 
this purpose a spherical wavefront is prepared by spatial filtering, placing a diffracting 
pinhole (dia. several um) in the vicinity of the focal spot of the FEL beam. The 
Hartmann sensor is positioned at a certain distance behind this pinhole, ensuring that 
its full field of view is illuminated by the central Airy disc. Thereafter, the reference 
spot distribution is registered (cf. Fig. 20.2, right). A temporally stable spherical wave 
as described above can also be utilized to assess the sensitivity of the Hartmann sensor. 


A 


Fig. 20.2 Left: EUV Hartmann sensor with integrated tip/tilt and lateral adjustment; the inset 
shows a close-up of the Hartmann pinhole plate. Right: spot pattern of the reference wavefront 
(A = 13.5 nm). The central pinhole is omitted for tip/tilt alignment 
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Fig. 20.3 Intensity profiles and wavefronts of single pulses at FLASH BL2 without focusing mirror 
(A = 7nm) 


At FLASH beamline BL2 the single pulse wavefront repeatability was determined, 
recording a series of 100 single FEL pulses behind a 5um pinhole at an emission 
wavelength of 13.8nm. A root-mean-square deviation AWrms = 0.12 nm (A/116) 
was evaluated, defining an upper limit for the achievable measurement precision of 
the wavefront sensor. 

After these qualification tests the Hartmann sensor was used to analyze the FEL 
beam of FLASH, at first without focusing mirror [13]. Beam profiles and wavefronts 
recorded at BL2 for single pulses at a wavelength of 7 nm are displayed in Fig. 20.3, 
showing relatively strong pulse-to-pulse fluctuations as typical for the SASE pro- 
cess. The saddle-like shape of the wavefront indicates an astigmatism of the beam. 
Nevertheless, the peak-to-valley (wp,) and root-mean-square (W,ms) wavefront aber- 
rations computed after tip/tilt and defocus subtraction of the measured wavefronts 
are relatively low (Wms ~ A/10). 

Neglecting influences from partial coherence (cf. Sect.20.3), the beam propa- 
gational parameters can be computed from the measured intensity and wavefront 
distributions according to the moments method described above (20.8)-(20.11). Cor- 
responding beam characteristics are compiled in Table 20.1, taking the average over 
20 single pulses. Despite the observed astigmatism (waist separation x — y = 10m), 
the evaluated M? value of 1.15 is remarkably low, which can be explained by the 
small higher order wavefront expansion coefficients and the smooth intensity profile. 

In contrast to these data, wavefront measurements performed behind beam line 
optics can lead to much less satisfactory results, caused by an insufficient fine- 
adjustment of the optics. An example is shown in Fig. 20.4 (left) for an ellipsoidal 
focusing mirror at BL 3 of FLASH: for this carbon-coated grazing incidence mirror 
with 2m focal length the w,, and wrms values of the recorded wavefront are more 
than an order of magnitude higher than without focusing optics. However, by on-line 
wavefront diagnostics the EUV Hartmann sensor allows for fine-tuning the mirror 
alignment. As seen from Fig. 20.4, the dominating strong astigmatism introduced by 
the optical element could be completely removed by real-time optimizing the pitch 
and yaw angles of the ellipsoidal mirror. After alignment a wpy of 12 nm and a wyms 
of 1.1 nm (<A/10) could be achieved. 
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Table 20.1 Beam parameters of FLASH computed from Hartmann data (BL2, X = 7 nm) 


Beam parameters X Y 

wpv [nm] 5.3 + 0.69 

Wrms [nm] 0.67 + 0.09 

Beam propagation parameter M? 1.15 + 0.08 

Beam propagation parameter M Fe 1.23+0.1 1.1+0.1 

Beam width d [mm] 6+0.2 4.40.1 

Waist position zo,; [m] —109.2 + 0.9 —99.2 + 1.4 

Rayleigh length zę [mm] 3760 + 484 5090 + 731 

Waist diameter do,; [um] 2nd moment 200 + 20 223 + 25 

Divergence 0 [urad] 552 4442 

Starting position During alignment Final position 

yaw: 0 mrad yaw: 0.65 mrad yaw: 0.65 mrad 
pitch: 0 mrad pitch: -1 mrad pitch: -0.95 mrad 


wpv wp 


wixy) wixy) wixy) 


0 


wpv=27.6 nm, wrms=3.9 nm wpv=12.7 nm, wrms=1.1 nm 


Fig. 20.4 Wavefronts measured at different steps of the alignment procedure of the ellipsoidal 
focusing mirror at FLASH beamline BL3 (A = 13.3 nm). Note that the scale w(x, y) is enlarged 
by a factor of ten for the starting position to account for the very large initial astigmatism 


After optimized fine-adjustment of the focusing element, a Fresnel-Kirchhoff 
integration of the Hartmann data allows for propagation of the beam, as described in 
Sect.20.2.1. Corresponding results recently obtained at FLASH II are displayed in 
Fig. 20.5 for three z-positions close to the beam waist [14]. The propagated profiles 
are currently compared with PMMA imprints taken at the respective positions by 
J. Chalupsky et al. (Acad. of Sci., Czech Republic). 
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Fig. 20.5 Left: Wavefront and intensity distribution of FLASH 2 (FL24, A = 13.5 nm) recorded 
~2m behind KB optics with EUV Hartmann sensor; right: Profiles obtained by Fresnel-Kirchhoff 
back propagation of the Hartmann data to the beam waist 


20.2.3 Beam Characterization of High-Harmonic Sources 


The EUV Hartmann wavefront sensor successfully applied for beam characteriza- 
tion at FELs was also employed to investigate EUV radiation generated by High- 
Harmonic (HHG) sources. Especially for their use in CDI experiments, a proper 
alignment is crucial since successful reconstruction of phase objects can only be 
achieved if the phase distortions of the probe beam are negligible. In cooperation 
with Claus Ropers’ group, the propagation of the 25th harmonic (A = 32 nm) of 
a Ti:Saphire laser was studied after passing a toroidal grating that combines spec- 
tral filtering and focusing. The Hartmann sensor was positioned behind the focus, 
capturing the EUV wavefront while the angle of incidence of the harmonic on the 
grating was varied. As for the FEL beam line mirrors, the recorded wavefront initially 
shows a strong astigmatism (cf. Fig. 20.6, left), which can be minimized by real-time 
alignment. A description of the corresponding beam propagation by matrix meth- 
ods [15] yields good agreement to the experimental data, especially the astigmatic 
waist difference (cf. Fig. 20.6 right, blue line). From the theoretical computations, 
the achievable beam intensity is estimated as a function of the incidence angle. As 
expected, the highest photon flux is obtained for an angle of incidence where the 
astigmatic aberration disappears. Apparently, already a slight misalignment of 0.5° 
leads to a decrease of the achievable intensity by 50% compared to its optimum. 
Thus, in order to achieve short exposure times and prevent reconstruction errors for 
following CDI experiments, this alignment procedure plays an essential role. With 
the correspondingly optimized HHG source, it was possible to successfully image 
test samples at the diffraction limit [16]. 
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Fig. 20.6 Left: Wavefront (3D) and intensity profile (below) of a HHG beam (25th harmonic). 
Right: Astigmatic waist difference Az and achievable irradiance Imax plotted as a function of the 
angle of incidence on toroidal grating. The theoretical curve Az(q) (solid blue line) lies slightly 
above the experimental values (blue dots) [15] 


20.2.4 Thermal Lensing of X-ray Optics 


In addition to static aberrations of optical components given by figure errors or 
misalignment, beam propagation can also be deteriorated by transient distortions of 
the wavefront introduced by the beam itself due to local heating and surface defor- 
mation (thermal lensing). In order to investigate the influence of this effect on the 
performance of X-ray optics, in particular high power mirrors to be employed for the 
European XFEL/Hamburg, we have performed time-resolved wavefront measure- 
ments in pump-probe experiments at the ESRF/Grenoble [9]. In this investigation a 
Si mirror sample was exposed to an intense 15 keV beam, and its thermally induced 
surface deformation was monitored by measuring the wavefront of a reflected optical 
laser probe beam with the help of a Hartmann-Shack wavefront sensor (cf. Fig. 20.7). 
By reconstructing and back propagating the wavefront, the deformed surface could 
be retrieved for each time step. Thus, the dynamics of the created heat bump, espe- 
cially its build-up, maximum amplitude and relaxation, were analyzed with a surface 


> 


to to+ 40us 


Fig. 20.7 Left: Schematic view of pump-probe setup to determine thermal wavefront distortions 
of high power X-ray optics at ESRF. Right: surface topology of a Si mirror reconstructed from 
wavefront measurements for different delays between the X-ray pump and infrared probe pulse. 
The decay of the heat bump on the mirror proceeds at a time scale of several ten microseconds 
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height resolution in the nanometer range. For the investigated Si sample deformations 
induced by a bunch train of X-ray pulses were in the order of several ten nanometers 
(peak-to-valley); a relaxation time constant of ~30 us was obtained. The data were 
interpreted taking into account results of finite element method simulations. Due to 
its robustness and simplicity this method can find further applications at new X-ray 
light sources (FEL), or to gain deeper understanding on thermo-dynamical behavior 
of highly excited materials under non-equilibrium conditions. 


20.3 Wigner Distribution for Diagnostics of Spatial 
Coherence 


Apart from the beam profile and shape of the wavefront, the degree of lateral coher- 
ence of a beam has a crucial impact on the minimum achievable focal spot size. 
Whereas both wavefront and irradiance distribution may be discovered from a single 
shot experiment, the latter is not true for the spatial degree of coherence y, which is, 
like the mutual intensity J [17] defined on a four-dimenesional space of lateral posi- 
tion x and mutual distance s. Earlier approaches for spatial coherence measurement 
at FELs utilize Young’s double pinhole experiment [18] to derive the latter from 
fringe visibility in the corresponding interference patterns. However, any substantial 
mapping of (x, $)-pairs requires a vast number of pinhole arrangements and image 
recordings to be evaluated, which appears to be very inefficient with respect to the 
experimental effort. Therefore, those measurements have only been carried out for a 
few selected points x within the beam profile and one or two perpendicular directions 
of the pinhole separation vector s. An alternative approach is based on the investi- 
gation of lateral correlation of local intensity fluctuations in the beam profile [19]. 
Although this method is more efficient than Young’s experiment, only the modulus 
of y can be specified. 

In order to determine the full complex degree of coherence, an alternate strategy 
has been employed to recover the mutual intensity J (x, s), i.e. through a measurement 
of the Wigner distribution function (WDF) h(x, u), representing the two-dimensional 
Fourier transform of the mutual coherence function [20]. Prior to the presentation of 
experimental setups and results the theoretical background of the applied formalism 
will be briefly summarized. 


20.3.1 Theory 


The Wigner distribution A(x, u) of a quasi-monochromatic paraxial beam is defined 
in terms of the mutual intensity J (x, s) as a two-dimensional Fourier transform of 
the latter [21] 
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EN? s S\ iku. 
h(x, u) = (;-) [ol pet 5) ets ds. ds}, 


where x = (x, y) ands = (s,, sy) denote spatial and u = (u, v) angular coordinates 
in a plane perpendicular to the direction of beam propagation, and k is the mean 
wave number of light. Propagation of the Wigner distribution h and its 4D Fourier 
transform h through static and lossless paraxial systems from an input (index i) to an 
output (index o) plane, signified by a 4 x 4 optical ray propagation ABCD matrix 
S, writes [22, 23]: 


h;i(Dx — Bu, —Cx, + Au) = h,(x, u) (20.8) 

hi(A"w + C't, B’w+ D't) =h,(w, t), (20.9) 

where (w, t) are the Fourier space coordinates corresponding to (x, u). Considering 
a set {p} of parameters, defined by the optical system being employed to gener- 
ate projections of the phase space, and a set of irradiance profiles 1p} recorded at 


positions which are connected to an arbitrary reference plane via corresponding ray 
transformation matrices $p}, one obtains: 


FT ~ = 
[rine u) du dv = Lip) (x) <> hip (W, t= 0) = Lip (w) (20.10) 


and from (20.9) and (20.10) [24]: 
hree (Af) W, Bin w) = Lp) (w). (20.11) 


Propagation through free space in beam direction (z axis) is described by the ABC D 
matrix 
gaf: (20.12) 
= 01 , 


corresponding to the detector position in the experimental arrangement described 
below. Thus, (20.10) becomes 


hrep(W, z: w) = (w), (20.13) 


representing a four-dimensional mapping relation between Fourier transformed 
intensity distributions and the Wigner distribution of the beam (Projection Slice 
Theorem). Following this equation, the phase space of h is filled with data from 
intensity profiles measured at several z-positions, as for instance obtained from a 
caustic scan of the beam. A subsequent four-dimensional inverse Fourier transform 
of Å results in the Wigner distribution function. 
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The global degree of coherence K is calculated by 


X2 
K = P2 h(x, u)’ dx dy du du, (20.14) 
(P = total power of the beam) and the mutual coherence function is derived by a 
two-dimensional Fourier back-transform 


I(x, 8) = J h(x, we" du dv. (20.15) 


The coherence lengths /, and l, are deduced as half width at half maximum of 
J (0, 0, sx, 0) and J (0, 0, 0, sy), respectively. 


20.3.2 Experimental Results 


As mentioned above, the Wigner distribution can be derived from intensity profiles 
along the propagation direction of a beam. Figure20.8 shows the corresponding 
experimental setup employed for caustic measurements of the FEL FLASH. Here, 
focusing is achieved by a carbon-coated ellipsoidal mirror with a focal length of 2 m. 
The EUV sensor consists of a phosphorous screen imaged onto a CCD chip by a 
10x magnifying microscope. A motorized translation stage allows for movement of 
the detector in z-direction within a range of 250 mm, covering up to ten Rayleigh 
lengths zp in both directions around the beam waist. During the caustic measurement, 
FLASH was running in single bunch mode at a wavelength of A = 25 nm. Typically, 
profiles are acquired at more than 100 different z-positions around the beam waist. 
FEL beam profiles at three positions in the focal region behind the ellipsoidal 
mirror are displayed in Fig. 20.8, indicating pronounced vertical stripes which can 
be attributed to a residual ripple-like corrugation of the mirror surface [25], while for 


Sw FEL beam Microscope 10x CCD camera 
Phosphor screen 
w 


A % — 
z | | Næ 
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Fig. 20.8 Left: Experimental setup for the Wigner distribution measurement at FLASH and selected 
profiles close to the beam waist at A = 25 nm. Right: Measurement of the beam profile at mean 
waist position together with reconstruction from the obtained Wigner distribution. Ellipses indicate 
the coherent fraction of the beam area 
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Table 20.2 Beam propagation parameters of FLASH evaluated from Wigner measurements (beam 
waist diameter do, Rayleigh length zr, beam quality factor M?, coherence length / and global degree 
of coherence K) 


do [um] 
x-direction 67 12.2 8.6 9.0 0.032 
y-direction 53 4.3 4.6 11.6 0.032 


y-direction the profiles are distributed much smoother. In the focal position the 
structure vanishes into a uniform distribution. From the Wigner distribution function 
computed according to the previous section we reconstruct beam profiles at arbitrary 
positions z in the following fashion: h(x, u) is propagated via (20.8) applying propa- 
gation matrix S;,; from (20.12), subsequently the near field of the beam is generated 
by the integration /,(x) = f h,(x, u) du dv. The resulting reconstructed intensity 
distribution at average waist position is displayed in Fig. 20.8 (right) together with 
the corresponding experimental profile. Apparently, a good agreement between mea- 
sured and reconstructed intensity distribution is achieved which confirms the validity 
of the obtained Wigner distribution. From the mutual coherence function J of the 
beam, reconstructed at the average waist position by application of (20.15), the coher- 
ence lengths /, in horizontal and /, in vertical direction can be calculated as HWHM 
values of the 1D slices J(0, 0, sx, 0) and J (0, 0, 0, s,). The dotted curves in Fig. 20.8 
show the corresponding ellipses 52/1? + s/t = 1 and x?/d? + yd = |, indi- 
cating the coherence area and total beam area, respectively. It is appearent, that the 
coherence length corresponds to a small fraction of the beam diameter only. The 
exact values are given in Fig. 20.2. The coherence for the vertical beam direction 
is found to be significantly larger than for horizontal direction. Furthermore, the 
global degree of coherence is calculated as K = 0.032, unveiling an apparently low 
coherence of the FLASH beam. 

In comparison with existing coherence measurements based on Young’s double 
slit good agreement is found for the coherence lengths, but the global degree of 
coherence is lower by one order of magnitude. This discrepancy can, at least partly, 
be explained by the fact that an ensemble of beam profiles is employed for the 
Wigner evaluation, resulting in beam properties in terms of mean values. In contrast, 
in Young’s experiment individual pulses are analyzed which yield corresponding 
maximum values [18]. Another issue leading to underestimated values of the global 
coherence is the incomplete 3D mapping using only profiles from a standard caustic 
measurement (cf. next section). 


20.3.3 4D Wigner Measurements 


The described reconstruction of the Wigner distribution based on beam profiles 
acquired from free space propagation according to (20.13) covers only a 3D sub- 
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Fig. 20.9 Setup for a 4D measurement of the Wigner distribution 


set of the phase space, as z is the only free parameter in the ray propagation matrix 
(20.12). Although such a reduced mapping is sufficient for some special cases cov- 
ering e.g. separable or quasi-homogeneous beams and coherent beams with zero 
twist, the validity of these conditions is not a priori known. Therefore, an extended 
approach has been established, using, in addition to the detector z-position, the ori- 
entation angle ¢ of an astigmatic optical element as mapping parameter. The corre- 
sponding ray matrices Siz ¢} [24, 26] permit, according to (20.11), a complete 4D 
map of the phase space. The extended setup including a non-rotational symmetric 
element is shown in Fig. 20.9. It applies a rotatable toroidal mirror, introducing a 
fourth degree of freedom into the system. Thus, choosing measurement parameters 
(rotation angles, camera positions) properly, it is possible to reconstruct the entire 
4D Wigner distribution, also for non-separable beams. 

In order to test and qualify the 4D approach, a diode-pumped Nd: Y VO; laser oper- 
ating at its fundamental wavelength A = 1064 nm (continuous wave) was employed 
which accomplishes the selective excitation of Hermite-Gaussian modes by pumping 
the laser crystal at different lateral positions (“mode generator”). TEMoo, TEMio, 
TEM 2, TEMo; and an uncorrelated superposition of TEM) and TEMpı were inves- 
tigated in a setup similar to Fig.20.9, using a polished aluminum toroidal mirror 
(radii 200 and 300 mm) and a standard CCD camera as detector. Rotation of the mir- 
ror and movement of the camera has been achieved by servo motors. The automated 
measurement for each laser mode consists of 410 beam profiles in total, correspond- 
ing to 10 rotation angles and 41 z-positions. In Fig.20.10 the Wigner distribution 
functions reconstructed from the described measurement are depicted for the TEM» 
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Fig. 20.10 Qualification of 4D Wigner formalism using near IR mode generator for different 
Hermite-Gaussian beams: Wigner distributions of TEMo2 and TEMo3 modes resulting from theory 
and experiment are shown for comparison 


Table 20.3 Global degree of coherence K for different Hermite-Gaussian beams in theory and 
experiment 


TEMoo TEMjo TEMo2 TEM 3 TEM 9 + TEMoı 
Theory 1 1 1 1 0.5 
Experiment | 0.95 1.06 0.98 0.90 0.46 


and TEM 3 modes, together with the expected theoretical WDFs of the analyzed 
beams. Obviously, the experimental results correspond nicely to the theory. Also 
quantitatively, the expected global degree of coherence K = | is reproduced with 
an accuracy better than 10%. As seen from Fig. 20.3, this holds also for the other 
investigated modes. 

After successful qualification, the 4D Wigner method was adapted to the extreme 
UV wavelength of 13.5nm by employing a toroidal MoSi multilayer mirror (cur- 
vature radii 4145 and 4050 mm, cf. Fig. 20.9), serving to further characterize FEL 
beams. Recently, first 4D measurements of the WDF have been performed at beam- 
line FL24 of FLASH 2. The data were acquired for the unfocused beam, employing 
a tilt angle of 8° on the toroid. In total 500 profiles at 50 z-positions and 10 rotation 
angles were recorded within the 250 mm range of a motorized linear stage. Further- 
more, various apertures placed upstream the Wigner setup were utilized in order to 
modify beam size and spatial coherence properties. The 4D Wigner reconstruction 
yielded preliminary evaluation results for the global degree of coherence K between 
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K = 0.1 for a5 mm aperture and K = 0.25 for a 3mm aperture, which is one order 
of magnitude higher than with the 3D approach. For better comparison to Young’s 
measurements the square root of K is more appropriate, leading to VK = 0.3 for 
the 5mm and /K = 0.5 for the 3mm aperture diameter, respectively. These values 
agree qualitatively quite well to results obtained from interference fringe contrast in 
Young’s double pinhole experiments. The observed higher degree of coherence can at 
least in part be attributed to the focusing optics at FLASH II (high quality Kirkpatrick- 
Baez mirror instead of slightly corrugated ellipsoidal mirror at FLASH J). Further 
clarifying work is in progress. 


20.4 Conclusion and Outlook 


The feasibility of a compact Hartmann wavefront sensor to be employed for beam 
characterization of FELs and HHG sources emitting in the EUV spectral region has 
been demonstrated. The device accomplishes simultaneous recording of both wave- 
front and intensity distributions, allowing for an optimization of the beam transport 
by fine-tuning the focusing optics. For both FELs and HHG sources we demon- 
strated that Hartmann wavefront sensor assisted alignment can considerably reduce 
the astigmatic focal difference induced by grazing incidence mirrors and gratings. 
The resulting decrease of wavefront error leads to higher spot brightness result- 
ing in an enhanced CDI performance. In case of the FEL FLASH a reduction of 
residual wavefront aberrations to Wms ~ A/10 could be achieved. Wavefronts and 
intensity profiles of single FLASH pulses were recorded, accomplishing an analysis 
of beam fluctuations of the SASE FEL. From these data characteristic propaga- 
tional parameters of the FLASH beam were computed by applying the moments 
method. Fresnel-Kirchhoff integration allowed for numerical beam propagation, in 
particular for an analysis of profiles in the waist region which is hardly accessable 
for a direct measurement. Future activities will involve extension of the Hartmann 
approach to harder X-rays in cooperation with European XFEL/Hamburg, as well 
as an improved prediction of the propagation characteristics by employing a sensor 
with higher dynamics and spatial resolution. 

In addition, it has been shown that the propagation of partially coherent radia- 
tion is successfully described by the formalism of the Wigner distribution function. 
In comparison to existing studies, this is achieved without the need of simplifying 
assumptions on the beam. It is expected that the obtained comprehensive beam char- 
acterization leads to further improvements in the field of CDI and related techniques. 
Inclusion of coherence information into the propagation formalism will enable a 
successful prediction of focal intensity distributions and spot sizes even for sources 
with relatively poor coherence properties. 


Acknowledgements The authors like to thank Barbara Keitel and Elke Plönjes (DESY Photon 
Science) for their support during the measurements at FLASH as well as for stimulating discussions. 
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Chapter 21 A) 
Laboratory-Scale Soft X-ray Source get 
for Microscopy and Absorption 

Spectroscopy 


Matthias Müller and Klaus Mann 


Abstract Soft X-ray microscopy and absorption spectroscopy are extremely use- 
ful tools for high-resolution imaging and chemical analysis of samples in various 
scientific fields. However, due to the required high photon flux of soft X-ray radi- 
ation, up to now, both methods are almost exclusively performed at synchrotron 
sources. Thus, great efforts have been made to develop table-top sources emitting 
in the soft X-ray spectral range (A = 1-5 nm). Here, the development of a laser- 
produced plasma source from a pulsed gas jet is presented, enabling the construction 
of an almost debris-free, compact and long-term stable X-ray source. Based on this 
source a compact soft X-ray microscope (spatial resolution 50nm) operating at a 
wavelength of A = 2.88 nm was built and applied for imaging of various test and 
biological objects. In addition, a laboratory-scale NEXAFS spectrometer has been 
established, allowing for reliable analysis of different absorption edges at photon 
energies between 250 and 1250 eV. 
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21.1 Table-Top Soft X-ray Source Using a Pulsed Gas Jet 


The laser-produced plasma source for the generation of intense soft X-ray radiation 
is based on a gas-puff target and a Nd:YAG laser system (wavelength A = 1064 
nm, pulse energy 700 mJ, pulse width 7 ns, repetition rate 5Hz) as schematically 
depicted in Fig.21.1. The radiation characteristics depend strongly on the gases 
used (see Fig. 21.3a, b): Photon emission spectra of low atomic number gases (e.g. 
oxygen and nitrogen) consist of individual, isolated lines, whereas those of gases 
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Nd:YAG-Laser, 7 ns, 
1064 nm, 600 mJ 


Fig. 21.1 Schematic drawing (left) and photograph (right) of the table-top laser-produced soft 
X-ray plasma source at Laser-Laboratorium Göttingen 


with a high atomic number (argon, krypton, xenon) are quasi-continuous due to the 
higher number of possible electronic transitions. 

The pulsed gas jet is created by a conical nozzle (diameters dı ~ 550 um, da X 
300 um, cone half angle 7°) behind a fast valve. The latter is based on the Proch- 
Trickl setup [1], consisting of a piezo disk translator to generate short gas pulses 
(topen = 900 us), allowing for a background pressure of about 5 - 107? mbar during 
operation (gas pressure p = 10-20 bar). 

As compared to alternative laser-produced plasma sources employing solid or 
liquid targets, the gas jet based source offers several advantages. In particular, these 
are! 


e Low debris generation, i.e. cleanliness, 
e Continuous supply of target material, 
e Long-term stability. 


However, due to the reduced particle density of the gas jet, the peak brilliance is 
definitely smaller as the plasma size increases to several hundreds of micrometer. 
Progress in the enhancement of gas density has been achieved by using cluster beam 
targets [2] or double-stream gas puff targets [3]. Another successful approach is the 
generation of a so-called “barrel shock”, applying a small background pressure p, to 
the supersonic flow emanating from the nozzle ([4], see Fig. 21.2a). On passing this 
barrel shock system, particles become locally concentrated, forming high-density 
regions. Focusing the laser beam into the high-density region behind one of these 
shocks, a higher number of gas atoms can be ionized, resulting in a brighter and 
smaller plasma (see Fig. 21.2b). The peak brilliance of the source is increased by one 
order of magnitude to 3.15 - 10!° photons/(mm? mrad s) for the isolated nitrogen line 
at \ = 2.88 nm. 

The emission characteristics of a soft X-ray plasma are not only affected by the 
target gas and its particle density, but also by the laser parameters. The influence of 
the laser pulse length was investigated, employing two lasers of about the same pulse 
energy (500 mJ, \ = 1064 nm), but with different pulse durations of 7 ns and 170 
ps [5]. As seen from the compilation in Fig. 21.3, the spectral characteristics of the 
plasma emission are clearly affected by the pulse duration (i.e. power density): The 


21 Laboratory-Scale Soft X-ray Source for Microscopy ... 551 


(a) 


—| jd. = 0.3 mm 


p,=170 mbar (p> 
+ 


Po = 11 bar 


i Typical plasma position 
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Fig. 21.2 a Schematic representation of the plasma generation in a “barrel shock”: The local 
confinement of target gas atoms by an X-ray transparent background gas (He, p, > 10 mbar) allows 
for a plasma ignition at a greater distance from the nozzle as compared to the standard expansion 
into vacuum (pp < 1074 mbar). b Pinhole camera images indicating plasma intensities at \ = 2.88 
nm, superimposed on Schlieren images of the gas jet under vacuum conditions and in an ambient 
He atmosphere, respectively [4] 


Barrel shock 
Mach disc 


(a) 


45 i © 
2 404NVI-| Qh, = 340m) Nitrogen) ] ae = 
53s] | 2 ATX) m ns, measured | 8 measured ? 
coe | x Är X g Ar XIII 
-PG | g Àr XII 2 r Ar} 
5 3.04 | = — 3 Ar XI 7 
hes | | = ArX = 
S25 —NVII : z l 
220 | 3 ArX ‚Ar XIN E | p far xiv 
— 2 1 n n Q 
21.5 | Elarıx WIX ) |j A Š, Ar XI—; | | | Ar&v 
5.” P NVU 1 > “SNA = | A EN 
210 | in vi — ] 2 ur F Ar XV = ArX i) Wi H i üi 
Saa] | 3 = ArX—, Bs 
295] mem fy | § N nah, E |Arın i WAL wh 
0.04 — = = 2 | 2) N uM 
400 500 600 oe = =} 
ra Photon Energy [eV] = ee | 5 ee 
40 R 5 3 3 
N N 
w35] ANN Krypton] = E = 
Z TA Aa s z 
2 3.04 Jy Wa, E S E 
28 Q, 380 mJ £ = 
Swi | N 2 i 2 il} 
£ i | Wilh £ 
v 
An Qon = 380 mJ 2 i 2 Ik 
Wera =| | W =! uN j Mu AS 
| san at et 400. +500 600 N 300 400 500 600 
0. x o 
200 300 400 500 600 ‘Photon Energy [eV] Photon Energy [eV] 


Photon Energy [eV] 


Fig. 21.3 Influence of laser pulse duration on emission spectra of a nitrogen and b krypton from 
both ns (red) and ps (blue) laser plasmas (target gas pressure 10 bar, 200nm Al-filtered, average of 
100 pulses). c Comparison of measured and calculated argon emission spectra for ns and ps laser 
and corresponding pinhole camera images [5] 


brightness is strongly enhanced, for some emission lines by more than a factor of 10. 
The picosecond spectra are shifted considerably towards shorter wavelengths since 
the higher power density results in a higher degree of ionization. 

By comparing the spectra with model calculations using a magneto-hydrodynamic 
code electron temperatures and densities were obtained, indicating the maximum 
achievable degree of ionization of the plasma. As an example, Fig.21.3c shows 
measured and calculated emission spectra of the argon plasma for both ns and ps 
laser excitation. The average electron temperature resulting from the simulation is 66 
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Fig. 21.4 Pinhole camera 
images of krypton plasma 
(average 100 pulses) taken at 
90° and 30° to the incident 
laser beam 


— 


250 um 


eV for the ps plasma (33% larger than for ns), whereas the electron density is about 
3 times higher (22.4 - 10!° eV/cm?). Along with the higher degree of excitation, the 
ps laser-produced plasmas are also considerably smaller, as could be monitored with 
a soft X-ray sensitive pinhole camera (see Fig. 21.3c inset). 

Recent investigations on the angular emission characteristics of the laser-produced 
plasma have demonstrated that the emission of soft X-rays in backward direction is 
strongly favored. This can be explained by a shorter path length and thus, a reduced 
reabsorption of the radiation by the gas jet in direction of the laser beam. The plasma 
intensity is enhanced by a factor of 1.6 when the emitted soft X-ray radiation is utilized 
under an angle of 30° with respect to the incoming laser beam, as compared to the 
previously chosen 90° geometry (see Fig. 21.4). In addition, the effective irradiation 
area of the plasma becomes 4x smaller, and the positional stability of the source is 
increased by a factor of 5. 


21.2 Soft X-ray Microscopy 


Benefiting from the high absorption contrast between carbon and oxygen, transmis- 
sion X-ray microscopy in the spectral range of the “water window” (A = 2.3-4.4nm) 
is ideally suited for the investigation of biological samples. Using Fresnel zone plates 
as highly magnifying objectives spatial resolutions in the range of 10nm have been 
achieved at synchrotron sources [6]. However, in order to pave the way for a wider 
dissemination of soft X-ray microscopy, there is definitely also the need for table-top 
systems. Although considerable progress has already been achieved in this field [7— 
10], a further compaction and simplification of these microscopes is still necessary. 

Making use of the long-term stable and nearly debris-free laser-produced plasma 
from a pulsed nitrogen gas jet target, an extremely compact soft X-ray microscope 
operating in the “water window” region at the wavelength A = 2.88 nm was installed 
and tested [11, 12]. The setup of this table-top soft X-ray microscope is depicted 
in Fig. 21.5a. It consists basically of the laser-produced plasma source described in 
Sect. 21.1, an ellipsoidal condenser mirror, a Fresnel zone plate objective fabricated 
by standard lithographic techniques, and a back-illuminated CCD camera sensitive 
for soft X-ray radiation, all integrated into a vacuum system with a base pressure of 
1076 mbar. 

The soft X-ray radiation emitted from the nitrogen plasma is filtered by a titanium 
(Ti) filter to block out-of-band radiation, ensuring monochromatic irradiation of 
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Fig. 21.5 a Photograph of the table-top soft X-ray microscope. b Spatial intensity profiles of soft 
X-ray radiation at \ = 2.88 nm for different positions along the optical axis behind the ellipsoidal 
condenser mirror; the minimum spot diameter at z = 0 mm (object plane) is about 400 um (FWHM). 
c-f Soft X-ray micrographs of e Siemens star, d geo-colloids, e bacterium Deinococcus Radiodurans 
(DSM no. 20539) and f alga Trachelomonas oblonga (SAG 1283-11). Magnification and exposure 
time vary from 175 to 500 and 5 min to 60min, respectively. a, e, f reproduced from [12], with 
permission of AIP Publishing. d reprinted with permission from [11], Optical Society of America 


the sample at A = 2.88 nm wavelength. The adjustment of the grazing incidence 
condenser mirror was optimized by the measurement of intensity profiles at various 
positions along the optical axis. Figure 21.5b displays corresponding distributions 
for different camera positions. A Gaussian-like spatial profile with a diameter of 
about 400 um (FWHM) is measured at the focal plane, representing the object plane 
of the soft X-ray microscope. 

In order to assess the performance of the microscope a Siemens star test pattern 
was imaged. Figure 21.5c shows a corresponding micrograph, indicating an almost 
uniform illumination over a field of view of about 10 um. Structures with a size of 
about 50 nm are resolved in all directions [12]. Furthermore, various biological and 
geological samples were investigated (see Fig.21.5d, e). The characteristic shape of 
these objects is clearly visible, but almost no internal structure is apparent due to 
the thickness of the samples. The signal-to-noise ratio of the micrographs is rather 
low due to the relatively low brilliance of the plasma. However, the presented sys- 
tem offers various opportunities for scalability of the photon flux (see Sect. 21.1), 
thereby maintaining the compactness of the microscope and especially the inherent 
cleanliness of the soft X-ray source. 


21.3 X-ray Absorption Spectroscopy 


Another prominent application of soft X-ray radiation is absorption spectroscopy, 
probing the fine structure of X-ray absorption edges (Near Edge X-ray Absorption 
Fine Structure—NEXAFS), by exciting core level electrons to higher lying unoccu- 
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Fig. 21.6 Schematic CCD sensor 
representation (top) and slit Gratin; 
photograph (bottom) of the 

laboratory-scale NEXAFS Sample 

spectrometer 


Al-filter 652 mm 


Plasma 


pied states. As the energy levels of both initial and final states depend on the involved 
molecular bonds and their chemical environment, the spectral features of the near- 
edge fine structure represent a “molecular fingerprint” of the sample. Thus, NEXAFS 
spectroscopy is a very common analytical method for compositional surface anal- 
ysis, representing nowadays one of the most important applications of synchrotron 
radiation [13]. In contrast to soft X-ray microscopy, however, it has been conducted 
only a few times with laboratory-scale sources until now [14-17], although almost 
identical results as with radiation from storage rings are achievable. 

At the Laser-Laboratorium Göttingen (LLG) a compact spectrometer based on the 
laser-produced soft X-ray plasma source was developed and utilized for NEXAFS 
measurements. Figure21.6 shows the current setup of the table-top spectrometer, 
which makes use of a concave flat-field grating and a soft X-ray sensitive CCD 
camera. At synchrotrons absorption spectroscopy is conducted mainly by recording 
the total electron yield using monochromatic radiation for excitation. In contrast, 
the broadband radiation emitted from lab-scale sources allows for a polychromatic 
spectroscopic approach. 

Up to now, a considerable number of NEXAFS investigations has been performed 
at the soft X-ray absorption edges of various elements. A brief overview of the spectra 
measured so far at the carbon K-edge is given in Fig.21.7a, showing polyimide 
[16], humic substances and the organic matter of a Luvisol [18]. Differences in 
spectral features point out varying influences of e.g. aromatic, phenolic, and carbonyl 
functional groups. 
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Fig. 21.7 NEXAFS spectra acquired at various absorption edges with the compact spectrometer 
at LLG: a C K-edge spectra of polyimide, humic acid, fulvic acid and a luvisol bulksoil [19], b C1 
L-edge spectrum of NaCl [20] indicating EXAFS oscillations, in comparison with corresponding 
synchrotron results [21], c overview spectrum of PCMO [22], d O K-edge and Mn L-edge spectra of 
various Mn compounds [23] and e O K-edge and Fe L-edge spectra of hematite samples of different 
thickness 


Furthermore, NEXAFS studies were conducted at the absorption edges of Si, S, 
Cl (see Fig. 21.7b, [20]), Ca [24], O [22, 23], and, for the first time with a laboratory- 
scale setup, at the L-edges of Mn, Fe and Cu [22, 23, 25] at photon energies >500 eV 
(Fig. 21.7c-e). In all cases, there is an excellent agreement of spectral features com- 
pared to spectra obtained with synchrotron radiation. The energy resolution of the 
spectrometer is about E/AE ~ 450 at a photon energy of 430 eV. However, a higher 
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spectral resolution is necessary, e.g. for the analysis of Fe oxides to resolve the under- 
lying splitting of the pre-edge features around 530 eV and the L3 2 peaks between 
700 and 725 eV. 

Besides the investigation of different absorption edges, the soft X-ray source 
was applied for time-resolved NEXAFS measurements in Perovskit-type manganite 
Pro.7Cao.3MnO3 (PCMO). Pump-probe experiments reveal diminutive changes of the 
oxygen K-edge, stemming from an optically induced phase transition [22], which 
compare nicely to synchrotron data [26]. 

Due to the short mean free path of soft X-rays in air (only a few millimeters), 
NEXAFS experiments with the table-top spectrometer have so far been performed in 
ahigh vacuum system, excluding a number of interesting samples from spectroscopic 
investigations. To overcome this limitation, a new sample chamber was constructed 
using two silicon nitride membranes as vacuum windows (see Fig. 21.8a and b) to 
measure samples also under atmospheric conditions. 

NEXAFS spectra have been recorded under different conditions in this sample 
chamber, i.e. vacuum, air and helium, respectively, on e.g. the calcium L-edge and 
oxygen K-edge of crystalline calcium chloride tetra- or hexahydrate [25]. Obviously, 
NEXAFS measurements are feasible for all conditions at the calcium L-edge (see 
Fig. 21.8c). However, the oxygen signal is absent in vacuum and air (see Fig. 21.8d). 
During measurement in vacuum, the crystalline structure is changing along with 
the decreasing atmospheric pressure from a tetra- or hexahydrate into a water-free 
distorted “rutile-structure” [27], thus no traces of water are detected. In air, these 
hydrate structures are still present, but could not be measured due to the strong 
absorption above the nitrogen K-edge. Therefore, helium purging turns out to be 
essential for the proper investigation of sensitive samples. 
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Fig. 21.8 a Drawing and b photograph of the helium purged sample chamber. e Ca L-edge and 
d O K-edge spectra of CaClz - H20 recorded for different conditions. Reproduced from [25], with 
the permission of the American Vacuum Society 
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21.4 Outlook 


At present state, NEXAFS studies with the table-top spectrometer are conducted in 
transmission mode, requiring very thin samples (~100nm). Experiments in reflec- 
tion geometry would also allow for investigation of thick samples, which cannot be 
prepared for measurements in transmission. Additionally, taking advantage of the 
short penetration depth of soft X-rays under grazing incidence (<30nm), measure- 
ments in reflection would yield highly surface sensitive information. Thus, the setup 
shall be modified to accomplish NEXAFS experiments both in transmission and 
reflection mode. 

The soft X-ray microscope shall be extended to a STED integrated X-ray 
nanoscope. Combining both methods allows for comprehensive studies of biological 
objects due to the complementary contrast of both imaging techniques. Thus, in addi- 
tion to pure imaging of the sample also information about its chemical composition 
can be gained without transferring the sample between different setups. 

Moreover, the table-top EUV source can be used for metrological applications at 
the microlithographic wavelength A = 13.5 nm, e.g. for material testing or actinic 
sensor qualification. 
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Chapter 22 ®) 
Multilayer Zone Plates for Hard X-ray creek 
Imaging 


Markus Osterhoff and Hans-Ulrich Krebs 


Abstract This chapter reviews progress both in the fabrication of multilayer zone 
plate optics for focusing X-rays, as well as in imaging experiments using these optics. 
The fabrication based on pulsed laser deposition is accompanied by analytical and 
numerical treatment of X-ray propagation to control volume diffraction effects. On 
the imaging side, different schemes are presented; these include scanning-scattering 
with focused X-rays, holography, as well as recent advances in lens-enhanced phase- 
reconstruction. 


22.1 From Focusing to Imaging 


In this chapter we review progress on hard X-ray imaging methods using Multilayer 
Zone Plate (MZP) optics. First, we have developed Multilayer Laue Lenses (MLLs), 
as a means to one-dimensionally focus soft X-rays produced at table-top laser-driven 
plasma sources, see also Chap. 21. Focusing experiments using a depth graded MLL 
consisting of Ti/ZrOz layer were performed with the table-top soft X-ray source at a 
wavelength of 2.88 nm, achieving a focal spot size of 280nm [1, 2]. 

Then, first focusing experiments using MLLs and MZPs have demonstrated that 
two-dimensional focusing of hard X-rays (7.9 and 13.8 keV) is possible at 3rd gen- 
eration synchrotron radiation sources. For technical reasons, only optics with rather 
small aperture sizes have been fabricated in Göttingen. To compensate, pre-focusing 
optics (Kirkpatrick-Baez mirrors, KBs, or Compound Refractive Lenses, CRLs) were 
used to increase the flux density on the zone plates. In the first experiment, an MLL 
with a height of 401 nm was illuminated by a pre-focused KB mirror beam. From 
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Fig. 22.1 Reconstructed intensity profile of an MZP focus: a two-dimensional rendering on linear 
colour scale, b one-dimensional intensity cuts with Gaussian fit. The full width at half maximum 
is determined to 4.3nm x 4.7nm. Adapted from [4] 


far-field measurements, a focal spot size of 6.8nm (FWHM) was reconstructed at 
13.8keV photon energy [3]. 

Later, a two-dimensional hard X-ray spot was achieved. While two MLL optics 
were successfully crossed, the fabrication of a round MZP optic succeeded, too: A 
depth-graded circular W/Si multilayer was deposited onto a rotating W wire. In con- 
trast to lithographic fabrication, virtually unlimited aspect ratios of outer-most zone 
width (here: down to 5 nm) and optical thickness (here: length of straight sections of 
the wire, usually several hundred micrometres) are possible. In a first experiment— 
again with the nano-focusing optic placed in the KB beam—an unprecedented focal 
spot size of 4.3nm x 4.7nm was reconstructed (see Fig. 22.1) [4]. 

Backed by these achievements, the fabrication of round MZPs was improved 
significantly. In the following, we will concentrate on research dedicated to further 
optimise the lenses, and apply them for new hard X-ray imaging schemes [5]. First, 
we will cover the advanced design and improved fabrication; then we will review 
different experiments performed at synchrotron setups, and present results obtained 
during those imaging experiments. 

Preliminary imaging experiments could be performed, see Fig. 22.2a for a 
scanning-SAXS measurement on a Siemens star [5]. In Fig. 22.2b, c, multi-order 
images of semiconductor nanowires are shown. Scanning experiments without an 
Order Sorting Aperture yield information both from the holographic — 1st order and 
the focused + 1st order. Figure 22.2b shows an overview hologram of nanowires; 
the information in the centre of the image is missing due to a beamstop. But a high- 
resolution scanning-SAXS measurement in differential phase contrast mode of a 
nanowire on an electrical contact is shown in Fig. 22.2c. Both images are extracted 
from the same dataset. 

For a detailed description of Fresnel and Multilayer Zone Plates, see Chap. 3. 


22 Multilayer Zone Plates for Hard X-ray Imaging 563 


(a) STXM brightfield (b) holography (c) STXM phase contrast 
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Fig. 22.2 a First MZP-based STXM brightfield scan of a Siemens star test pattern, adapted from 
[5]. The smallest features of 50 nm could not be resolved due to vibrations of the early experimental 
setups. b Holographic image obtained the diverging — 1st order, after overlaying of shifted sample 
positions. The sample consists of semiconductor nano-wires [6] and electrical contacts. The centre 
is not visible in this image due to a beamstop; ¢ but in scanning-SAXS mode, here differential phase 
contrast. The image has been extracted using a “software-OSA” from the same measurement as the 
hologram, and shows a single nano-wire lying on an electrical contact. The tip of the nano-wire 
consists of a Au sphere 


22.2 Let There be an Ideal World 


In this section we show simulation results for “perfect” zone plates to assess the best 
case scenario and to devise new measurement techniques. 

Optically thin Fresnel Zone Plates (FZPs) are constructed via the Zone Plate Law, 
cf. Sect. 3.5. To model thin FZPs, an incoming wave-field can be multiplied pixel- 
wise with a complex-valued transmission function. For thick Multilayer Zone Plates, 
however, this approximation cannot be met. Usually, wave-optical propagation mod- 
elling through the structure has to be applied, if 


2 
_ Am) < | 


FS S1, 
At 


(22.1) 


with outer-most zone thickness Ary, wavelength A, and optical thickness t. For 
Ary = 5nm and à = 1A, the geometrical approximation becomes invalid for t > 
250 nm. 

The “imprint” of an infinitely thin zone plate on an impinging wave field W(x, y) 
is usually described by a complex-valued transmission function 7 (x, y); the outgoing 
field can then be propagated to e.g. the focal plane using the Fresnel-Kirchoff integral. 
For longitudinally extended objects, multi-slice approaches with various propagators 
are used; these alternate between numerical free-space propagation and transmission 
functions for “short” propagation distances with F >> 1 instead of (22.1). In the 
general case, T(x, y, z) can also change along the optical axis, so volume zone plates 
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phase/rad intensity 


Fig. 22.3 Intensity (top) and phase (bottom) simulated inside outermost zones of a flat (left), 
tilted (centre), and wedged (right) MZP; the grey arrows show the direction of the Poynting vector 
field, i.e. the “flow” of intensity. In the flat geometry, a “beating mode structure” known from array 
waveguides emerges after only 1 um, and the Poynting vectors are almost unaltered in direction, and 
also the phase fronts are almost perpendicular to the incoming beam direction; hence, the focusing 
efficiency is very limited. In the tilted case, the impinging field is coupled into the channels for 
about 2 um, and a focusing effect becomes visible in the outgoing intensity pattern. But only for 
the wedged case also the Poynting vectors align in the direction of the desired focal spot 


with tapering can be described. Here, we have mostly made use of the paraxial wave 
equation described in Chap. 2. 

Such tapering becomes important at higher X-ray energies (i.e., long devices) 
and smaller zones (i.e., smaller focus). This is shown in the simulation shown in 
Fig. 22.3, where three layouts of MZPs are compared. The three columns show the 
intensity (top) and phase (bottom) for a flat MZP (left), a tilted MZP (centre) and a 
wedged MZP (right). Dashed lines show the nano-focusing zones on a scale of 5nm, 
over an optical thickness of about 9.8 um. 

In both flat and tilted case, the phase fronts are almost flat and perpendicular to the 
optical axis. Also, a “mode beating” effect known from array waveguides can be seen 
in the intensity patterns. In the flat case, this beating starts at around | um, while for 
the tilted case the incoming wave is coupled for about 2 um into the “channels”. In the 
flat case, the outgoing wave field reproduces the “checker board” pattern; the tilted 
geometry, on the other hand, shows stripes directed towards the focus, promising 
enhanced efficiency. 

The grey arrows show the local direction of the Poynting vector field. In both 
cases, this field points almost parallel to the optical axis. Since the Poynting vectors 
locally measures the energy flow, this shows that the focusing efficiency into the 
+1st order is rather small. 

For the wedged case, on the other hand, the Poynting vectors are bent towards the 


focal point. 
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22.3 Back to the Real World: Fabrication Challenges 


Now we will leave the neat and clean simulations to “dive into the real world”. 

In this section we briefly review a particular fabrication technique for Multilayer 
Zone Plates. Firstly, by Pulsed Laser Deposition (PLD), a rotating glass fibre is coated 
with alternating layers; secondly, the final MZP is sliced out using a Focused Ion 
Beam (FIB) device. 

First, a depth graded multilayer was grown on a wire according to the Zone Plate 
Law by PLD. Then, the MZP was fabricated by cutting a slice out of the multilayer 
with desired optical depth by FIB. The MZP was positioned onto a W tip, which can be 
used as sample holder during hard X-ray focusing. For the first focusing experiments, 
the MZP was illuminated by a pre-focused X-ray beam at the coherence beamline 
P10 of the PETRA III synchrotron. Later on, a sample is either put into the MZP 
focus (scanning-SAXS/scanning-WAXS) or into the MZP defocus (holography) for 
imaging experiments. 


22.3.1 Pulsed Laser Deposition 


As a thin film sputtering technique, PLD allows the deposition of multilayer struc- 
tures with reliable layer thicknesses just in the range suitable for hard X-ray focusing 
optics. One advantage over other sputtering techniques originates in the high ener- 
getic bombardment: New particles enter the structure and increase the mobility. In the 
end, this “self-healing” property decreases roughness and tends towards cylindrical 
layers even on a slightly elliptically wire. 

The thin film deposition was realized by a computer controlled KrF excimer laser 
(wavelength of 248 nm, pulse duration of 30ns, repetition rate of 10 Hz). The laser 
beam was focused onto the different targets in ultrahigh vacuum of about 1078 mbar. 
The targets were moved constantly following an algorithm that allows uniform abla- 
tion from different directions. The films were grown at room temperature at a target- 
to-substrate distance of 65mm. By both changing the distance of the focusing lens 
to the target and the laser energy, the laser fluence was controlled in a range of 
1—5 J/cm? [7]. 

Different material combinations have been tested during this Collaborative 
Research Centre. We will present the results of W/Si, W/ZrOz, and the final combi- 
nation Ta2Os5/ZrO> further [8-10]. 


22.3.2 FIB Processing 


Within the FIB-facility (Nova NanoLab600, FEI), for further protection an additional 
layer of Pt is deposited onto the multilayer by electron beam deposition. Then, the 
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specimen is cut by a 30kV Gat-ion beam (using a current of about 1nA) close 
to the region, where the wire had its desired diameter. A piece of the coated wire 
is attached to a thin W-micromanipulator with electron and ion-beam deposited Pt 
and transferred onto the final lens holder, e.g. aW tip prepared beforehand. After- 
wards, the micromanipulator gets cut off and is drawn back, leaving the coated 
wire on the tip. Finally, the lens is shaped and polished by less energetic Ga*-ions 
(5kV, ca. 30 pA) down to the desired optical thickness, usually around 6 um; for the 
high-energy experiment, an MZP of 30 um optical thickness has been prepared. 


22.3.3 From MLL to MZP 


For the table-top soft X-ray microscope developed in project C04, one-dimensional 
Multilayer Laue Lenses (MLLs) have been manufactured. Other than round MZPs, 
these consist of parallel films on a flat substrate. It was found that PLD also works on 
curved substrates, and hence two-dimensionally focusing devices can be manufac- 
tured. To this end, a rotation motor was incorporated into the PLD vacuum chamber; 
the alternating layers are then deposited onto a thin wire while the latter rotates 
around its axis. 

Geometrically, the layer thickness is reduced by a factor of 7 for the same num- 
ber of pulses. Experimentally, however, transfer factors of T F ~ 3.8 (Ta2Os) and 
TF ~ 3.3 (ZrO) were found. The deviation from the geometrical T Fyeo = 7 can 
be explained by resputtering and reflection during the deposition on tilted substrates. 
From experimental data it is found that the dependance of the deposition rate on the 
angle y to the substrate varies more strongly than the expected cos y. This behaviour 
could be reproduced using SDTrimSP simulations [8]. 

Note that due to aging of the target materials the deposition rates also change in 
time. This can be compensated by changing the number of pulses accordingly. 

Glass wires for the MZP were prepared in a heat-and-pull process using standard 
glass fibres [9, 11]. The core wire of the TazO5/ZrO2 MZP was prepared using a Sutter 
Instruments flaming/brown micropipette puller system P1000. With this technique 
it has become possible to prepare round glass fibres of suitable diameters between 
lum and 2um. Within our parameters, this corresponds to the first about ten zones 
of the Zone Plate Law that are missing. In addition, it is possible to draw tapered 
fibres with an opening angle of few degree, see Fig. 22.4a. 


22.3.4 Material and Parameter Studies 


During this Collaborative Research Centre, different material combinations (W/Si, 
Ta2Os/Si, and Ta205/ZrO2) have been tested, and the PLD parameters (mainly the 
laser energy, or flux density) have been optimised. 
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Fig. 22.4 a SEM image of a tapered glass fibre, prior to coating: both tapering angle (here about 
0.9°) and diameter (here around 2 ym can be tweaked during the pulling process. b In situ deposition 
rate measurements during target aging for Ta2Os. For a laser fluence of 1.6 J/cm? (black), plate 
like structures are formed on the target surface leading to a strong decrease of the deposition rate 
at higher pulse numbers. In contrast, the deposition rate becomes extraordinary stable at 2.6 J/cm? 
(blue). The SEM pictures show the corresponding target morphologies after 10° pulses. From [8] 


When calculating the number of pulses necessary to fulfil the Zone Plate Law, the 
deposition rate has to be known. The rate depends on the material and laser energy; it 
was also found that during the deposition process, the rate changes on short and long 
time scales due to target aging. Deposition rates have been obtained from reflectivity 
measurements on coated flat substrates. The time-dependent rates are then used by 
the control software to calculate the number of pulses for the individual layers. 

Especially the W/Si system suffered from strongly varying deposition rates. With 
increasing number of laser pulses, in both cases the deposition rate first strongly 
rises and then monotonously decreases again. The very early stages of target aging 
depend on the initial surface topography. On longer time scales, the deposition rate 
decreases exponentially with the number of pulses; this is attributed to further surface 
roughening (for W) and to cone formation (for Si). Also, the W/Si system suffers from 
some limitations due to droplet formation during ablation of Si. The laser fluence 
was tuned to 1.7J/cm? to reduce the number of droplets; still about 400 droplets 
per mm? are found on a 10nm layer. 

To circumvent droplet formation, Si was replaced by ZrO2, which also offers a 
large phase shift for X-rays when combined with W. In contrast to Si, the surface of 
ZrO, targets remains relatively smooth at a laser fluence of 1.8 J/cm?, and droplet 
formation is almost completely avoided. Furthermore, the deposition rate of ZrO is 
about five times higher than for Si (about 45 nm per 10° pulses) and remains much 
more stable. In contrast to W/Si, no enhanced resputtering is observed at the interfaces 
of planar W/ZrO> multilayers by in situ rate measurements, even at increased laser 
fluence on the ZrO, target. 

To increase the deposition rates, W was replaced by Ta2Os; it was found that for a 
laser fluence of 2.6 J/cm’, the deposition rate becomes extraordinary stable and gives 
ideal conditions for the deposition of both thin and thick films. The development of 
the Ta2Os deposition rates for two fluences is shown in Fig. 22.4b. 
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Fig. 22.5 SEM images (top) and TEM images (bottom) of “D13”, an MZP with outermost zone 
width of 5nm and a diameter of 16 um. The optical thickness (top left) of 7 um is designed for an 
X-ray energy of 14keV. Adapted from [9] 


Since the deposition rates of both materials were sufficiently high and did not 
change significantly over time, multilayers with precise layer thicknesses and larger 
overall thickness of 1.2 ym could be deposited onto the glass wire, while almost no 
droplet formation occurred at the same time. 

Recently, MZPs with a diameter of 16 um, outer-most zones down to 5nm, and 
optical thicknesses of up to 7 um have been built and successfully used in X-ray 
experiments; see Fig. for SEM and TEM images. 
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22.3.5 Summary 


Due to high and almost constant deposition rates, minimization of resputtering, sharp 
multilayer interfaces, and low transformation factors, TazO5/ZrO> is a very promising 
multilayer system for the fabrication of high quality lenses for hard X-rays by the 
combination of PLD and FIB. 


22.4 The World of Synchrotron Instrumentation 


Here we briefly describe the basic setup used at synchrotron radiation facilities for 
the imaging experiments presented later on. Most measurements were carried out 
using the GINIX end-station of the P10 beamline at the PETRA II synchrotron 
(DESY, Hamburg). The high-energy experiment at 60 keV up to above 100keV was 
performed at the ID31 beamline of the European Synchrotron Radiation Facility 
(ESRF; Grenoble, France); recently, a ptychographic measurement was successfully 
carried out at the PtyNAMi setup of the P06 beamline at PETRA II. 


22.4.1 Hard X-rays Near 14 keV 


Most experiments were carried out using the versatile Göttingen Instrument for 
Nano-Imaging with X-rays (GINIX) at the P10 coherence beamline of PETRA III 
[12]. From the undulator radiation, a monochromatic beam around 8.0 or 13.8 keV 
is filtered by either a double-crystal Si(111) monochromator, or by a more stable 
channel-cut monochromator. Then, with Compound Refractive Lenses (CRLs), the 
X-ray beam is pre-shaped onto the MZP and the sample. 

During first alignment of the zone plate, the CRLs are moved out of the beam. 
Alignment of the MZP-tip/tilt angles is eased when illuminated by a large beam 
of around 200 um diameter. For the MZP-based measurements, the CRL beam is 
usually set to a beam size of approximately 30 um, so an MZP with diameter of 
16 wm is illuminated homogeneously. It was found during the scanning SAXS/WAXS 
measurements, that combining the ultra-small focal spot of the MZP with a 2um 
CRL beam and a lithographic FZP of 30nm zone width enables a profitable zoom-in 
capability. Since all three kinds of optics are in-line (without lateral shifts), features 
of the sample can be measured at different sequential resolution levels. 

For the success of these experiments, close collaboration with Michael Sprung, 
beam line responsible for the P10 at DESY, Hamburg, and Tim Salditt, project leader 
of C01, was crucial; we thank Michael, Tim, and their teams for outstanding support. 
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22.4.2 High Energies: From 60 to 101 keV 


Opposed to Fresnel Zone Plates, which are traditionally fabricated using lithography 
and hence achieve only small aspect ratios, Multilayer Zone Plates are a promis- 
ing optic to focus even higher X-ray energies. The regime above 30keV is usually 
attributed to Compound Refractive Lenses; here, we show that also MZPs can be used 
to focus photon energies from 60 to 100keV. The proof of principle experiments were 
carried out at the high energy beam line ID 31 at the European Synchrotron Radiation 
Facility (ESRF) in Grenoble, France. 

The MZP setup (see next subsection for more details) was integrated into the 
HEMD setup (High Energy Microdiffraction Instrument). This X-ray diffractometer 
is used to study buried interfaces; micro-focused X-ray beams of high energies can 
penetrate even metallic specimens. Scattering from buried interfaces (Reflectivity), 
hidden nano-scaled particles (SAXS), or packaged crystallites can be investigated. 
As detector, a Pilatus 2M single-photon counting pixelated hybrid detector with CdTe 
sensor material is used. 

For the experiment, an MZP with a diameter of 8 ym and smallest zones of 10 nm 
was fabricated using PLD; with the FIB, a thick slice of 30 um was cut out. This 
length provides optimal phase shift at 60 keV. 

We report on our experiment in the Imaging section further below [10]. 


22.4.3 Sampler Scanner 


In an early phase of the project, we learned that the versatile approach of the GINIX 
setup poses severe limits on resolution once the 10 nm region is targeted. To minimise 
vibrations and drift, a new sample tower was designed and commissioned. Only the 
most essential degrees of freedom for the imaging experiments are included. 

For the MZP, three translational stages based on the Piezo stick-slip principle by 
SmarAct GmbH (Oldenburg, Germany) are used for lateral alignment; the vertical 
z-movements are accomplished by an inclined linear positioner and an additional 
free-moving guideway to increase stability. On top, horizontal movements are facili- 
tated by one positioner plus one guideway (lateral y-direction) and by one positioner 
of increased stability (longitudinal x-direction). Travel ranges are 10mm in z (verti- 
cal), and >40 mm in x and y (horizontal). The x-motor then has an adjustable mount 
position for an FZP optic, and holds a tip/tilt motor for the MZP optic. This tip/tilt is 
adapted from a motorised optical mount with stick-slip motors in open-loop mode. 
The maximum angles are +5°. A custom-built encoder based on a two-dimensional 
Position Sensitive Detector (SEEPOS PSD Signal Process System with 2L45_MH02 
sensor; SiTek Electro Optics, Partille, Sweden) is being commissioned to improve 
the angular motorisation. 

For the sample, three translations by Physik Instrumente (PI GmbH, Karlsruhe, 
Germany) are used for coarse alignment. For x and y, a PILine piezo ultrasonic drive 
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rms ~ 0.14nm 


Fig. 22.6 Photograph of the Multilayer Zone Plate sample scanner (left) and encoder values 
recorded during a fast two-dimensional scan (right). The MZP is mounted on a tip-tilt stage with 
large translational stages by SmarAct; the sample is then coarsely aligned by three translations, and 
can be raster-scanned using a two-dimensional piezo stage by PI Physik Instrumente. According to 
encoder values, the positional accuracy of the continuous scan is better than 2Ä 


of large area (model M-686) is used as base; a vertical NEXACT Piezo stepping 
drive (model N-765) with a high load capacity of 25N is mounted on top. Travel 
ranges are 25mm in x and y, and 6.5mm in z. On top of the alignment motors is 
a vertically mounted Piezo scanner of high stiffness with a clear aperture of 50 mm 
edge length (model P-733). The built-in capacitive sensors show an r.m.s. noise of 
better than 0.2nm. The maximum travel range is 30 um; line scan speeds of 100 Hz 
and more are possible at reduced travel range. 

All translational stages employ optical encoders with nanometre resolution. The 
non-linear, but reproducible movements of the tip/tilt stage have been characterised 
and adjusted in software by fourth order polynomials (Fig. 22.6). 

The stage was designed for fast Scanning Transmission X-ray Microscopy 
(STXM) with new hybrid pixelated photon-counting detectors like the EigerX 4M 
(Dectris Inc., Baden-Daetwill, Switzerland). This particular detector is able to image 
at 750Hz frame rate; each of the four million pixels has a full analog-digital pro- 
cessing chain to count single photon events. When configured for “fly scan mode” 
(continuous STXM), the Piezo moves a two-dimensional trajectory and starts a series 
of detector acquisitions via hardware trigger; a common scan consists of 255 lines 
with 255 detector frames each. In the fastest mode with 1.3 ms exposure times this 
is done in less than three minutes. This could be shortened further in bi-directional 
scanning mode [13, 14]. 

During fast scanning, inertial forces of the Piezo stage and sample holder have to 
be compensated by the mechanical system; otherwise, the MZP itself moves signif- 
icantly, rendering the system unusable. We have measured the mechanical feedback 
from the sample part onto the optic part of the setup. For that, the movement of the 
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MZP holder has been measured interferometrically, and vibrations induced by both 
currently used and ultimate scanning parameters have been studied. It was obtained 
that the mechanical feedback is usually less than 0.1%; this means that during a 
STXM scan of 1,000 points, the MZP moves by less than one “pixel”. Hence, this 
mechanical feedback does not induce new non-linearities into the system. 

For data analysis of large STXM scans, the dedicated “Heinzelmännchen” cluster 
is capable of analysing up to 3,000 frames per second [15, 16]. 


22.4.4 Improvements of the GINIX Setup 


The GINIX instrument at the P10 beamline at PETRA III was designed for waveg- 
uide based holography at resolution scales of about 100nm. With progress in optics 
fabrication (both for waveguides and for zone plates), this became a limiting factor 
in imaging. Together with project C01, several measures have been implemented 
to improve the stability. But first, the vibrations had to be quantified in a reliable 
manner. For that goal, two techniques have been applied. (i) Laser interferometry 
is able to measure the relative distance between two objects; this allows to quantify 
the absolute translational amplitude, but only relative between emitter and reflector. 
On the other hand, (ii) acceleration sensors (accelerometer 731-207 by Wilcoxon 
Research Inc., Meggitt PLC, Dorset, UK; including Dataq DI-155 data logger/Red 
Pitaya STEMlab) can measure absolute movements in space, but only the second time 
derivative. Numerical integration to extract the distance signal cannot reproduce slow 
and long-term drift-like movements. Note that both methods yield one-dimensional 
data. 

After the implementation of vibration measurements, different strategies to 
improve the situation have been evaluated. We briefly discuss a few methods and 
share lessons learned. 

Active vibration isolation using e.g. the Nano Series (Accurion GmbH, 
Göttingen) is a portable plug-in system that measures and actively damps vibra- 
tions in six degrees of freedom. We found that amplitudes of 50nm and below are 
already “too good” for the system to work properly; instead, high frequencies were 
shifted to drift on the sub-second scale. Within the requirements of CO1 and C12, 
the system did not work as desired. 

Passive vibration isolation using a mechanical mass damper was implemented 
based on finite-element simulations modelling the vibrations. The principle is that 
an inert mass is accelerated by the vibrations, and the movements are subsequently 
dissipated by pushing a rubber absorber. During tuning of the resonance frequency, 
virtually no measurable effect could be seen; however, vibrations change significantly 
from night do day. It is assumed that the rubber absorber does not dissipate enough 
energy for very small amplitudes. 

Further attempts include additional air springs, rubber mounts, and foams. Also, 
it has been tried to shift the resonance frequencies by additional masses on the 
breadboard. No measurable improvement of the vibrations could be achieved. 
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High-resolution platforms based on additional granite tables and combined sam- 
ple towers with integrated mountings for optics. While the general flexible instru- 
ment of the GINIX is remained, high-resolution experiments are enabled by addi- 
tional setups. For more details on the sample tower of project C12, see the previous 
subsection. 


22.5 Imaging 


In this section, we report on several imaging experiments. During the progress of 
project C12, many different imaging modalities have been implemented and tested 
for their suitability to study different samples in different geometries. Here, only a 
selection of results is presented. 


22.5.1 Ptychography 


During the second and at the beginning of the third funding period of our CRC, 
several attempts for a ptychographic reconstruction of the MZP focus were tried— 
unsuccessfully. Ptychography is a scanning technique that uses overlapping mea- 
surements to support both phase retrieval and the separation of wave-fields into 
illumination and sample scattering. Ultra-small beam sizes of MZPs on the order of 
sub-10nm, however, introduce several problems, which were identified and subse- 
quently resolved. 

Vibrations on the order of 50nm are a drawback of the flexibility of the GINIX 
setup. More details have been given in the previous section. First experiments suffered 
a lot; also because online-reconstruction was not possible during early stages of the 
project, visual feedback to steer the experiment was missing. 

Sampling issues due to the diverging —1st diffractive orders rendered the recon- 
struction impossible. For ptychography to work correctly, the maximum beam size is 
bounded by A R/A, where R is the sample—detector distance and A the detector’s 
pixel size. With Pilatus detectors and hence A=172 um in a distance of R = 5m, 
the illuminated area has to be smaller than about 1 um in diameter. The negative 
order of our MZP optics, however, illuminates a circle of more than 30 um. 


22.5.1.1 Results at the PtyNAMi Instrument 


To overcome the sampling issue of the negative order, we have successfully intro- 
duced on OSA close to the focal spot during a beamtime at the PtyNAMi instrument 
of the P06 beamline (PETRA III, DESY Hamburg). Due to the short focal length of 
f ~ 1mm, the OSA needs to be positioned about 200 um upstream of the focus. 
Then, a Siemens star test pattern was moved from the downstream side close to the 
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Fig. 22.7 a Ptychographic phase reconstruction of a Siemens star test pattern, b shows the intensity 
of the X-ray nano focus on a linear colour map in the best focal plane, e shows horizontal and vertical 
line cuts, with Gaussian fits of about 10nm x 11 nm 
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focal region; after careful approaching, ptychographic scans could be performed. 
Results of the reconstruction are shown in Fig. 22.7. Figure 22.7a shows the recon- 
structed phase of the Siemens star; the smallest features of 50 nm are clearly resolved. 
Also damaged regions in the central part from an earlier experiment are visible. Figure 
22.7b shows the intensity distribution in the best focal plane after numerical propa- 
gation; horizontal and vertical line cuts with Gaussian fits are shown in Fig. 22.7c. 
The FWHM of the focal spot size can be estimated to about 10nm. 

Note that the used MZP was not the “design cut” from the wire, and alignment 
could not be perfected during the ptychographic experiment during the allocated 
time slot. Nonetheless, this is the first successful ptychographic reconstruction of the 
MZP focus, indicating that resolution on the single-digit nanometre scale is in reach. 


22.5.2 Holography and Scanning SAXS 


Holography is a full-field imaging method, and the contrast is based on the local 
electron density inside the specimen and its imprint in the near-field intensity distri- 
bution. Scanning SAXS, on the other hand, is an imaging modality where sample or 
focused X-ray beam are scanned, and the contrast is extracted from far-field diffrac- 
tion patterns and carries information about ordered structures inside the illuminated 
area [17-22]. For diffractive optics like Zone Plates, usually an order sorting aper- 
ture (OSA) is placed between sample and FZP. With a short focal length of ~1 mm 
for the MZPs used here, the alignment of an OSA becomes impractical; hence, a 
“software OSA” has been used to extract multi-order holographic images from com- 
bined scanning-holography datasets. The scheme for each order is similar to the 
propagation-based imaging described in Chap. 2. 

For sake of simplicity, we make use of the first diffractive orders, +1st and — Ist; 
with the sample placed in a defocus plane Ax Æ 0, the propgation distances are 
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X41, = Ax and x_; = 2f + Ax; with the detector placed at a position x2 from the 
MZP, the effective distances become 


Ax xx = (2f + Ax) x x2 
—— * 0.1mm, ai = © 2.1 mm. 
Ax + x2 2fAx +x 


X41 = 
For the numerical values, we have assumed a focal distance f = 1 mm, a defocal 
distance Ax = 0.1 mm, a detector distance x. = 5 m, an X-ray wavelength \ = 1A, 
and a detector pixel size of p = 75 um. 
The corresponding Fresnel numbers and magnifications are 


M4) M) 
Fy, = IMA 348109, ya!" 390% 10: 
AXX AX x2 
A 2 A 
Wan a si, Mi, = tar 9399 x, 
Ax 2f + Ax 


When the sample is laterally scanned in the defocus plane Ax, both holograms 
“move” with different velocities in the detector plane; by a simple re-arranging of the 
pixelated intensity values according to the magnifications M+ , two scaled holograms 
can be obtained. Also, an “average flat-field” can be extracted from the measurement. 

For Ax > 0 => F}; > o, andthe +1st signal approaches a traditional Scanning 
X-ray Transmission Microscopy (STXM) contrast; in first order, differential phase 
contrast maps can be deduced from tracking the centred moment of the intensity 
distribution: 


g'(y) =k sind = 2 sin tan! E y dy 
A J T(y) dy 

Combing these imaging modalities of the 5nm MZP with a “traditional” (i.e., 
lithographically produced) FZP with an outermost zone width of 30nm, and a com- 
pound refractive lenses (CRL) optics with a focus size of about 2 um, multi-scale 
imaging with a zoom-in capability has been explored. The different optics and field- 
of-views are shown in Table 22.1. 

A phase-reconstruction using the relaxed averaged alternating reflection (RAAR) 
algorithm has been carried out on the —Ist order hologram of semiconductor 
nanowires (for raw hologram, see Fig. 22.2). An overall view is shown in Fig. 22.8a, 
while Fig. 22.8b shows a zoom-in on a single nanowire. These reconstructions stem 
from the holographic dataset in Fig. 22.2b; Fig. 22.2 shows a close-up of the (here 
invisible) nanowire lying on an electrical contact in (vertical) differential phase con- 
trast mode. 
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Table 22.1 For multi-scale imaging, different optical illuminations and measuring schemes are 
combined, with a decreasing field of view and increasing (ideal-case) resolution. The Compound 
Refractive Lenses (CRL) are from the P10 beamline, DESY Hamburg; the Fresnel Zone Plate (FZP) 
was generously provided by Christian David, PSI (CH); the Multilayer Zone Plate (MZP) is “D13” 


from the C12 project 


Optical setup Measuring scheme Field of view Resolution limit 
Parallel beam Holography, SAXS, >200 um Detector pixel size 
WAXS 
MZP, — Ist Holography 30m Demagnified pixel size 
CRL pre-focus Alignment scan Overview scan 20 um 
CRL focus Scanning Scan size 2um 
SAXS/WAXS 
FZP Holography, scanning | Scan size 30nm 
MZP Holography, scanning | Scan size 5nm 
| 0 
5 -0.01 
-0.02 
10 
E 0.03 5 
a = 
ron 
15 AR 0.04 
ia 0.05 
20 1 
-0.06 d 
l um 
f A 
-0.07 
5 10 15 20 


Fig. 22.8 Phase-reconstruction from a holographic measurement on semiconductor nanowires (see 
Fig. 22.2). Using the relaxed averaged alternating reflection (RAAR) algorithm, quantitative phase 
information of the nanowires and electrical contacts can be extracted 


22.5.3 Scanning WAXS 


High X-ray energies beyond 30 keV offer long penetration lengths into bulk material, 
so that even hidden and buried particles can be studied. Many composite materials 
are crystalline, with particle sizes in the sub-j.m region; also, crystal domains within 
larger compounds of few hundred nm in size are common. So far, small structures 
can be studied ex-situ, i.e. as isolated or free-standing objects. But often, chem- 
ical and physical properties change when devices are assembled from individual 
components [10]. 

To demonstrate that MZP optics can be used in a scanning nano-WAXS setup, 
we have fabricated a focusing optic with unprecedented aspect ratios for hard X-ray 
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Fig. 22.9 a Front SEM view of the high-E MZP, showing an 8 um aperture. b Side SEM view of 
the high-E MZP, showing the optical thickness of 30 um. c An imprint of Ag droplets/nanocrystals, 
sandwiched between two layers of ZrO2 (SEM top view). The inset shows an SEM image of one 
droplet after it has been exposed with FIB. d Spatially resolved Bragg peak intensity, measuered 
at E = 60keV. e 1D line scans of the same droplet (shifted for clarity), showing substructure in 
scan 46 (purple). In case of scan 50 (light blue), the droplet serves as a knife edge with a steepness 
of better than 50nm. Note that the droplet was not actually in the best focal plane of the MZP. 
f Spatially resolved Bragg peak intensity, measuered at E = 101 keV 


energies. SEM images of the final MZP are shown in Fig. 22.9a, b. The lens has a 
diameter of 8 ym with outer-most zones of 10 nm, and an optical thickness of 30 um. 
This length is optimised for a phase-shifting zone plate at an X-ray energy of 60 keV. 

As atest sample, Ag droplets (nano crystallites developing during the PLD process 
for detuned parameters) were buried within 1.5 um thick layers of ZrO% on a planar 
Si substrate. The typical droplet size is on the single um scale. Figure 22.9c shows 
a top SEM view of the specimen. Although the droplets are buried, their shape and 
position can be seen as imprints on the surface. The inset shows an SEM image of a 
Ag droplet exposed with a FIB after the beamtime [10]. 

The experiment was performed at the high-energy station of ID31 at the ESRF 
(Grenoble, France). In the first user experiment with the new Laue-Laue bend- 
ing monochromator, X-ray energies at 60keV (bandwidth 0.44%) and 101keV 
(bandwidth 0.57%) were selected from the 5th or 9th harmonic, respectively, of 
an in-vacuum cryo-cooled undulator with a 14.5mm period. The X-ray beam was 
pre-focused by up to 288 Be lenses, in order to illuminate the MZP. The Bragg 
diffraction patterns were recorded with a Pilatus3 CdTe 2M detector (Dectris Inc., 
Baden-Daetwill, Switzerland). 
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The undiffracted order of the MZP illuminated basically a diluted powder of 
droplets and an amorphous thin Ag film. This resulted in a Debye-Scherrer ring 
at 0.4Ä =! or a Bragg angle of 1.5° at 101keV on the detector. While the sample 
was raster-scanned through the MZP focus, a single Bragg peak was heavily excited 
above background level. Figure 22.9d shows a real-space mapping of the intensity 
in a single Bragg peak, showing the outline of a particular crystal grain inside one 
droplet. The lateral size of the grain is about 500 nm, compatible with sizes seen with 
the FIB. 

Line scans through the intensity profile are shown in Fig. 22.9e for measurements 
at 60 keV, and for a different droplet (Fig. 22.9f) measured at 101 keV. The resolution 
has been determined to be better than 50nm at 60keV and even better than 40 nm 
at 101 keV. The full potential of the MZP focus depends of course on properly 
aligning the sample in the focal plane, which has not been achieved in the presented 
experiment. Nonetheless, the results show the feasibility of nano-focusing three-digit 
energies that are able to penetrate into nano-sized particles buried deeply inside bulk 
material [10]. 


22.5.4 Correlative Scans 


Recently, zone plate based holography was combined with scanning in both SAXS 
and WAXS regime. To this end, project C12 teamed up with projects BO3 and B10 
(Simone Techert). A sample made of amino acid crystallites (D- and L-Tryptophan) 
was raster scanned sequentially with a large WAXS detector (EigerX in “front 
position” at about 0.3m) and then with a smaller SAXS detector (Pilatus in “rear 
position” at about 5.1 m). In addition, holographic datasets have been taken. Due to 
alignment difficulties, most scans were performed using a Fresnel Zone Plate (outer- 
most zone width 30 nm) and Compound Refractive Lenses (focal spot size on the um 
scale). A systematic measurement to study the growth process of such crystallites is 
envisioned; to spatially resolve the real-space sub-structure, a Multilayer Zone Plate 
will be used. 

This setup combines certain strengths of the different modalities. Holography is— 
in principle—a full-field technique; here it is combined with small scans of the sample 
to implement the “software OSA”. With the FZP, large overview images at moderate 
resolution can be obtained. Changing to the MZP optic, a zoom-in capability with 
a field of view slightly larger than the crystallites and at few nanometre spatial 
resolution become possible. The fundamental contrast mechanism is the phase shift 
due to the electron density of the specimens. 

Scanning SAXS, on the other hand, is sensitive to surfaces and especially mor- 
phology; since length scales of up to 100nm can be probed, it is beneficial to use 
the CRL beam. With scanning WAXS, then, it becomes possible to distinguish the 
crystalline grains within the sample. 
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Fig. 22.10 Dominant crystallite orientations mapped into real-space, for a pure Cu, b CuCl, 
c D-Tryptophan (200) and d D-Tryptophan (010) lattices 


From preliminary scanning-WAXS measurements, the dominant crystallite ori- 
entations can be mapped into real-space images. Figure 22.10 shows these dominant 
orientations for (b) CuCl, (c) D-Tryptophan (200), and (d) D-Tryptophan (010). 

A detailed analysis to study correlations in reciprocal and real-space is still 
on-going. 


22.6 Summary 


During the second funding period, a collaboration of projects C01 and C04 have 
demonstrated the first two-dimensional 5nm X-ray focus. Building on this, project 
C12 has further developed hard X-ray imaging in different areas. Together with 
C01, the GINIX setup at the P10 beamline (DESY Hamburg) has been optimised in 
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terms of motorisation, reducing of vibrations / drift, automated data handling, and 
in the end image resolution. During a beamtime at the ESRF (Grenoble, France), 
high energy X-rays of 101 keV could be used to localise buried nano-crystallites at 
a spatial resolution better than 50nm. In a joint work with projects BO3/B10, the 
possibility of imaging and crystallite mapping has been successfully explored on 
organic matter. A full analysis on the complex structures of the Tryptophan sample 
is, however, beyond the scope of project C12. 

With the retirement of one of the principal investigators (H.U.K.), the deposi- 
tion process and optimisation thereof has been paused; however, new MZPs have 
successfully been cut out off existing coated wires for new experiments. 

Due to rapidly increasing amounts of data from scanning experiments, processing 
and analysis had stagnated for a while; with the Heinzelännchen system, a dedicated 
analysis platform for scanning SAXS data was comissioned and has served different 
projects within this collaborative research centre. 
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Chapter 23 A) 
Convergence Analysis of Iterative ciecie; 
Algorithms for Phase Retrieval 


D. Russell Luke and Anna-Lena Martins 


Abstract This chapter surveys the analysis of the phase retrieval problem as an 
inconsistent and nonconvex feasibility problem. We apply a convergence framework 
for iterative mappings developed by Luke, Tam and Thao in 2018 to the inconsistent 
and nonconvex phase retrieval problem and establish the convergence properties 
(with rates) of popular projection methods for this problem. Although our main 
purpose is to illustrate the convergence results and their underlying concepts, we 
demonstrate how our theoretical analysis aligns with practical numerical computation 
applied to laboratory data. 


Mathematics Subject Classification: 65K10 - 49K40 - 49M05 - 65K05 - 90C26 - 
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23.1 Introduction 


We highlight recent theoretical advances that have opened the door to a quantita- 
tive convergence analysis of well-known phase retrieval algorithms. As shown in 
Chap. 6, phase retrieval problems have a natural and easy characterization as feasi- 
bility problems, and issues like noise and model misspecification do not effect the 
abstract regularity of the problem formulation. This was also observed in studies 
by Bauschke et al. [1] and Marchesini [2] reviewing phase retrieval algorithms in 
the context of fixed point iterations, though in those works the theory only provided 
convex heuristics for understanding the most successful algorithms. A slow progres- 
sion of the theory for nonconvex feasibility culminating in the work by Luke et al. 
in [3] now provides a firm theoretical basis for understanding most of the standard 
algorithms for phase retrieval. 
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The approach is fix-point theoretic and is based on a framework introduced 
by Luke et al. in [3]. Given some (set-valued) mapping T : E = E, where € is a 
finite-dimensional Euclidean space, the algorithms are studied as mere generators 
of sequences (x) ren through the fixed point iteration xl e Tx* Wk € N) with 
x* > x* where x* = Tx*. We demonstrate the convergence framework of [3] on a 
few of the more prevalent iterative phase retrieval algorithms introduced in Chap. 6. 

The analysis is based on two main properties. The first of these is the regularity 
of the mapping defining the fixed point iteration; the second property concerns the 
stability of the fixed points of the mapping. The first property is covered by the notion 
of pointwise almost averagedness, a generalization of regularity concepts like (firm) 
nonexpansiveness. Already in the 1960s Opial [4] showed that an iterative sequence 
defined by an averaged self-mapping with nonempty fixed point set converges to 
a fixed point. It is no surprise, then, that generalizations of averagedness should 
play a central role in convergence for more general fixed point mappings. In the 
setting of feasibility problems, i.e. finding a point in the intersection of a collection 
of sets, pointwise almost averagedness of the fixed point mapping is inherited from 
the regularity of the sets. 

The other concept that is central to the analysis concerns stability of the fixed 
points. This is the characterized by the notion of metric subregularity as presented in 
Dontchev and Rockafellar [5], and Ioffe [6, 7]. Metric subregularity of the mapping 
at fixed points guarantees quantitative estimates for the rate of convergence of the 
iterates. This is closely related to the existence of error bounds, and weak-sharp min- 
ima, among other equivalent notions that provide a path to a quantitative convergence 
analysis. 

In Sect.23.2 we remind the reader of the phase retrieval problem. Section 23.3 
and its subsections introduce basic notations and concepts. This is followed by a 
toolkit for convergence in Sect. 23.4 that describes the convergence framework we 
are working with. The use of this theoretical toolkit is demonstrated on two of the 
most prevalent algorithms for phase retrieval. We conclude this chapter with some 
numerical remarks in Sect. 23.8. 


23.2 Phase Retrieval as a Feasibility Problem 


The phase retrieval problem reviewed in Chaps.2 and 6 involves reconstructing a 
complex valued field in a plane (the object plane) from measurements of its amplitude 
under a unitary mapping in a plane somewhere downstream from the object plane (the 
image plane). We use the notation for the phase retrieval problem already introduced 
in previous chapters. For a detailed description see Sect.6.1.1. The measurements 
are represented by the sets 


M; := fuec] KFW; = V7 ji, G=1,2,...,m} (7 =1,2,...,m). 
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The problem of recovering the phase from just the modulus of unitary transformed 
measurements is impossible to solve uniquely. Usually nonuniqueness is associated 
with ill-posedness, but for feasibility problems it is rather existence that is the source 
of difficulty. In real-world problems measurement errors and model misspecification 
have profound implications for feasibility models, but not for the reasons that one 
might expect. The geometry of the individual measurement sets does not change in the 
presence of noise or model misspecification. The issue is that the measurements are 
not consistent with one another. In other words, there is no solution that satisfies the 
measurements and other model requirements (like nonnegativity, in the case of real 
objects). A solution from the provided information is then only an approximation to 
the actual signal. Mathematically these characteristics translate into an inconsistent 
feasibility problem. That is, the intersection of the sets in the feasibility model is 
empty. Inconsistency has been investigated in many works (see for instance [8-11]) 
but most of these studies consider convex sets. Unfortunately, the sets involved in 
the phase retrieval problem are mostly nonconvex and have empty intersecton. In [3] 
the authors provided a scheme to handle even this case. The following sections are 
devoted to their work and present the most important concepts. 

To avoid ambiguities recovering the phase, one often uses a priori information 
about the model. Common examples are the knowledge of a support of the signal, 
real-valuedness, non-negativity, sparsity or the information about an amplitude: 


support constraint G:=f{yeC"|y=0 Vi ¢ I} 
real-valued support constraint 6, := {y € R” | y; =0 Vi ¢ I} 
non-negative support constraint G4 := { y eR, | y =O Vi¢I } 
amplitude constraint A:={yel"’||y|=a, 1<k <n} 
sparsity constraint As := {y € R” | |lxllo < s} 


(23.2) 


for a set of indices J C {1,2,...,n}, a € Ri and s € {1,2,..., n}, where R} = 
{x € R” |x; > 0, 1 <i < n}. In the following we focus on the (non-negative) sup- 
port constraint. 


23.3 Notation and Basic Concepts 


Our setting throughout this chapter is a finite dimensional real Euclidean space € 
equipped with inner product (-,-) and induced norm ||- ||. The open unit ball is 
denoted by B, whereas S stands for the unit sphere in €. The open ball with radius 
ô and center x is denoted by B;(x). The iterative algorithms we analyze can be 
represented by mappings T : E 3 E, where = indicates that T is a point-to-set 
mapping. N denotes the natural numbers. The inverse mapping T~! at a point y in 
the range of T is defined as the set of all points x such that y € T(x). 
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23.3.1 Projectors 


We follow in this section the definitions introduced in Chap. 6. As a reminder: the 
distance of a point x to a set 2 C E is defined by dist (x, 2) := infyeg {lly — xl}. 
The corresponding projector onto the set 2 is given by Po : E SE, x> {ye 2| 
dist(x, 2) = ||y — x||}. A single element of Pox is called a projection. Similarly to 
the projector, the reflector onto a set 2 is defined by Ro : E 3 E, x RH 2Pcx — x, 
which is again a set. A single element in Ro is called a reflection. 

The regularity of a set influences the properties of the corresponding projector 
onto the set. The best properties are generated by convex sets. A convex set 2 is 
defined as a set that contains the line segment between any two points x, y € 2. The 
projector onto a convex set is not only single-valued, but can be characterized by a 
variational inequality (see for instance [12, Theorem 3.14]). As we see in Sect. 23.3.2 
the algorithms considered here are all composed of projectors and reflectors. This 
leads to an analysis of the projectors onto the sets introduced in Sect.23.2. The 
projector onto the measurement sets M j, defined in (23.1) was already discussed 
in Sect.6.1.2. The projectors onto the support constraint sets are even simpler. The 
following statement is taken from [1, Example 3.14]. 


Lemma 23.1 (projectors onto support constraints) Let y € C”, I C {1,2,...,n}. 
Then the following hold. 


yy Fel 


orl<i<n, (23.3 
0 otherwise f ~ ( ) 


Pey =z where zj = | 


max { Re(y;), 0} ifjel 


Pe,y=z where zj = forlsisn. (23.4) 


otherwise 


The projectors onto other constraint sets can be found, for instance, in [13] or [14] 
for a sparsity constraint, or in [1, Example 3.14] for an amplitude constraint or 
real-valued sparsity constraint. Except for the amplitude and sparsity constraint, all 
other mentioned constraint sets are closed and convex. The type of regularity of the 
constraint sets is later discussed in Remark 23.5.1. 

Another concept closely related to that of projectors are normal cones. 


Definition 23.2 (Normal cones) Let 2 C E. Define the cone containing 2 by 
cone() := R4} - 2 := {ks |K E€ Ry,s E€ Q}. 


Let 2 C E and let x € 2. 


(i) The proximal normal cone of 2 at x is defined by 
NB” (Œ) = cone (P3'x — x). 


Equivalently, x* € ND” (x) whenever there exists o > 0 such that 
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2 
(x*, y =x) < olly = xl" (Wy € 2). 
(ii) The limiting (proximal) normal cone of 2 at x is defined by 


Ngo (x) = Lim sup NB” (2), 


where the limit superior is taken in the sense of Painlevé-Kuratowski outer 
limit (for more details on the outer limit see for instance [15, Chap. 4]). 


When x € 2 all normal cones at x are empty (by definition). If the set 22 is convex, 
the given definitions of the normal cones coincide (see for instance [16]). 


23.3.2 Algorithms 


In the context of feasibility problems, a prominent class of iterative algorithms are 
projection algorithms. Under these, the most prominent and probably one of the eas- 
iest to compute is the method of cyclic projections as introduced in Sect. 6.2.1. Given 
a finite number of closed sets 21, 22, ..., Rm C E and a point it generates the next 
iterate by consecutively projecting onto each of the individual sets. For only two 
sets the algorithm reduces to the method of alternating projections. In Sect. 6.2.3 the 
error reduction algorithm was identified with the method of alternating projections 
applied to a measurement and a support constraint. This connection was first made 
by Levi and Stark in [17]. Considering again only two sets, Sect. 6.1.2 introduced the 
well-known Douglas-Rachford algorithm as well as its relaxed version, the relaxed 
averaged alternating reflection algorithm introduced by Luke in [10]. For one mag- 
nitude constraint and a support constraint Douglas-Rachford yields Fienup’s hybrid 
input output method (HIO) [18]. The connection of HIO and Douglas-Rachford was 
already observed by Bauschke et al. [1]. These three algorithms are the ones we want 
to focus on here. Nevertheless, we want to emphasize that the analysis shown below 
can be applied also to other projection methods. 

Our survey is far from complete. Other approaches worthy of mention are several 
of the algorithms discussed in Chap. 5 and those in Chap. 6. Readers familiar with 
the physics literature will also miss the Hybrid Projection Reflection algorithm, [19], 
difference map, [20], solvent flipping algorithm, [21], and Fienup’s Basic Input- 
Output algorithm (BIO). BIO is, in fact, nothing more than Dykstra’s algorithm, see 
[1]. Like the BIO algorithm, most of the known approaches to phase retrieval fit into 
a concise scheme presented in [22]. 


23.3.3 Fixed Points and Regularities of Mappings 


We refer to Fix T as the set of fixed points of the mapping T, i.e. x € Fix T if and 
only if x € Tx. The continuity of set-valued mappings is a well-developed concept 
and follows the familiar patterns of continuity for single-valued functions. One key 
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property is nonexpansiveness, which nothing more than being Lipschitz continuous 
with constant 1. That is, given two points, their images under the mapping T are no 
further away from each other than the initial points. A slightly stronger notion than 
nonexpansiveness is averagedness. For set-valued mappings, a finer distinction of 
the types of continuity, whether pointwise, or uniform, for example, is necessary. 
The following definition captures the crucial types of continuity and regularity of 
set-valued mappings that lie at the heart of numerical analysis of algorithms for phase 
retrieval. 


Definition 23.3 (almost nonexpansive/averaged mappings) Let DC E and T: 
DZE. 


(i) T is said to be pointwise almost nonexpansive on D at y € D if there exists a 
constant e € [0, 1) such that 


lxt — ytl <JS1+ellx—yll Y yt € Ty)(Yx* € Tx)(Yx € D). (23.5) 


If (23.5) holds with e = 0 then T is called pointwise nonexpansive at y on D. 
If T is pointwise (almost) nonexpansive at every point on a neighborhood of 
y (with the same violation constant €) on D, then T is said to be (almost) 
nonexpansive at y (with violation e) on D. 

If T is pointwise (almost) nonexpansive on D at every point y € D (with the 
same violation constant €), then T is said to be pointwise (almost) nonexpansive 
on D (with violation €). If D is open and T is pointwise (almost) nonexpansive 
on D, then it is (almost) nonexpansive on D. 

(ii) T is called pointwise almost averaged on D at y if there is an averaging constant 
a € (0, 1) and a violation constant e € [0, 1) such that the mapping T defined 
by T = (1 — aœ)ld + aT is pointwise almost nonexpansive at y with violation 
c/a on D. 

Similarly, if T is (pointwise) (almost) nonexpansive on D (at y) (with violation 
e), then 7 is said to be (pointwise)(almost) averaged on D (at y) (with averaging 
constant a and violation ae). 

If the averaging constant a = L, then T is said to be (pointwise) (almost) firmly 
nonexpansive on D (with violation €) (at y). 


From the above definition it can easily be seen that if a set-valued mapping is non- 
expansive at a point, then it is single-valued there. This is a crucial property for our 
analytical framework, but should not be confused with uniqueness of fixed points: a 
multi-valued operator can be single-valued at its fixed points without having unique 
fixed points. 


Proposition 23.4 (single-valuedness, Proposition 2.2 of [3]) Let T:E 3€ be 
pointwise almost averaged on D C € at x € D with violation e€ > 0. Then T is 
single-valued at x. In particular, if x € FixT, then Tx = {x}. 


Averaged mappings do not enjoy as nice a calculus as nonexpansive mappings, 
but the next proposition shows that averagedness of some sort is preserved under 
addition and composition. 
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Proposition 23.5 (compositions, Proposition 2.4 of [3]) Let T;:€ 3 € for j = 


1,2,...,m be pointwise almost averaged on U; at all y; € S; C E with violation 
e; and averaging constant a; € (0, 1) where U, D S; for j = 1,2,...,m. 
(i) IfU := Ui = U2 =--- = Um and S := Sı = Sz = +++ = Sm then the weighted 


mapping T := Di w;T; with weights w; € [0, 1], ee wj = 1, is point- 
wise almost averaged at all y € S with violation € = YS we; with averaging 
constant & = Max j=1,2,....m {a,;} on U. 

(ii) IfT;U; C Uj- andT;S; C Sj- for j = 2,3, ...,m, then the composite map- 
ping T := T; oT, 0---0 Tm is pointwise almost averaged at all y € Sm on Un 


with violation at most € = IP- (1 + €;) — 1. and averaging constant at least 


a=m/ (m 1+ l ). 


23.4 A Toolkit for Convergence 


With the characterization of algorithms as simply self mappings with certain reg- 
ularity properties, we show in this section how those properties come together to 
guarantee convergence of the algorithm iterations to fixed points. The fixed points 
need not be solutions to the feasibility problem (indeed, this does not exist for phase 
retrieval) but will in general be a point that allows one to compute another point that 
does have some physical significance, such as a local best approximation point. 

It turns out that convergence itself is provided by regularity properties introduced 
in Sect. 23.3.3. The basic convergence idea goes back to Opial [4]. It says that aver- 
agedness of a single-valued mapping T and nonemptyness of the fixed point set 
imply convergence of the iterative sequence (T*x?)zen to a point in Fix T for any 
x? € €. Henceforth, we will see that averagedness of T and a nonempty fixed point 
set is enough to get convergence. As one would expect, it can be difficult for a map 
to satisfy these properties globally. Nevertheless, this is often the case in nonconvex 
problem instances. Thus, we seek a statement that includes local properties. That is 
in our case pointwise almost averagedness as introduced in Definition 23.3. 

But convergence alone for iterative procedures is not enough: eventually one has 
to stop the iteration and without knowing the rate of convergence it is impossible to 
estimate how far a given iterate must be to the solution. A quantitative convergence 
analysis is achieved with the second essential property: metric (sub-)regularity. This 
concept has been studied by many authors in the literature (see for instance [5-7, 15, 
23, 24]). For the definition of metric regularity we need gauge functions. A function 
H : [0, 00) — [0, oo) is a gauge function if it is continuous and strictly increasing 
with u(0) = 0 and lim, u(t) = oo. The following definition is taken from [3, 
Definition 2.5]. 


Definition 23.6 (metric regularity on a set) Let ® : E = Y, U C E, V C Y. The 
mapping © is called metrically regular with gauge p on U x V relative to A C € if 
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dist (x, #7! (y) N A) < u (dist (y, S(x))) (23.6) 


holds for all x € U N A and y € V with O < u. (dist (y, D(x))). When the set V 
consists of a single point, V = {y}, then © is said to be metrically subregular for y 
on U with gauge u relative to A C E. 

When u is a linear function (that is, u(t) = kt, Yt € [0, 00)) one says “with con- 
stant K” instead of “with gauge u(t) = Kt”. When A = €, the quantifier “relative 
to” is dropped. When u is linear, the smallest constant « for which (23.6) holds is 
called modulus of metric regularity. 


While this definition might seem abstract there are properties that directly imply 
metric regularity or reformulations that allow to prove metric regularity. One of 
these is polyhedrality (see [3, Proposition 2.6]). A mapping T : € = € is called 
polyhedral if its graph is the union of finitely many sets that can be expressed as the 
intersection of finitely many closed half-spaces and/or hyper-planes [5]. 

Collecting the concepts we have established so far, we present the following 
convergence result that goes back to Luke et al. in [3, Theorem 2.2] and was later 
refined in [25] by Luke et al. to convergence to a specific point. 


Theorem 23.4.1 (basic convergence template with metric regularity) Let T : A 3 
A for ACE, ®:=T — Id and let S C ri A be closed and nonempty with TS C 
Fix TN S. Denote (S + ôB) N A by Ss for a nonnegative real ô. Suppose that, for 
all 5 > 0 small enough, there are y € (0, 1), a nonnegative scalar e; and a positive 
constant œ bounded above by 1, such that, 


(i) T is pointwise almost averaged at all y € S with averaging constant a and 
violation € on S.i5, and 


(ii) for Ri = Syz \ (FXTNS+Y'*!öB), 
(i) 


dist (x, S) < dist (x, ®7'(§) N 4) 
forall x € R; and y € ® (Psx) \ B(x), 


(ii) ® is metrically regular with gauge u; relative to A on R; x ® (Ps(R;)), 
where u; satisfies 


mist PG) _, _ jla 


sup —— < Ki (23.7) 
XER; FEO(Ps(Rj)), 5go) dist (Y, E (x)) Ei; 
Then, for any x? € A close enough to S, the iterates x/*! € Txi satisfy 
dist (xi, Fix T NA S) — Oand 
dist (x, Fix T N S) < cdist (x7, S) (Wx! € Ri), (23.8) 


where ci := 1+e- (52) <1. 
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1 


j ; = = ; ; 
In particular, if ki < K <,/ for all i large enough, then convergence is 


eventually at least R-linear with rate at most ¢ := ,/1+€— (42) to some point 


in FixT A S. If SA A is a singleton, then (iii) is redundant and convergence is 
Q-linear. 


In both Opial’s original statement as well as Theorem 23.4.1 averagedness is the 
essential property for convergence of iterative algorithms. Whereas assumption (ii) 
of Theorem 23.4.1 serves to quantify the convergence. 


23.5 Regularities of Sets and Their Collection 


In this section we connect the regularities of sets to regularities of the projectors 
on these, which effect the regularity of the mapping 7. When dealing with noncon- 
vex sets there are numerous set-regularity definitions available. A recent survey by 
Kruger et al. [26], sorted the different classes of nonconvex sets to highlight their 
dependencies and differences. Uniting several concepts of regularity, we propose to 
use the notion of e-ser regularity as introduced in [26] and refined in [27]. 


Definition 23.7 (e-set regularity) Let 2 C E be nonempty and let x € 2. The set 
22 is said to be e-subregular relative to A at x for (y, V) € gph (No) if it is locally 
closed at x and there exists an € > 0 together with a neighborhood U of x such that 


(U — @ = xt), xt — Y) < elv- & =27) |e" — FI, 
(Yx € ANU))(xt € Pox). (23.9) 


if for every e > 0 there is a neighborhood (depending on e) such that (23.9) holds, 
then 2 is said to be subregular relative to A at x for (y, v) € gph (No). If A = {x}, 
then the qualifier “relative to” is dropped. 


In the phase retrieval problem one type of nonconvexity, that is also covered by 
e-subregularity, is prox-regularity. 


Definition 23.8 (prox-regular sets) A closed set 2 is prox-regular at x € 2 if for 
v € No(X) there exist €, ô > O such that 


Slx =c? > w,x-c) (Yx, c € 2NBs®))(Wv € No(c) NBs(0)). 


This definition dates back to Federer [28] who called the property sets with positive 
reach. The definition presented here is taken from [29, Proposition 1.2]. The authors 
in [29] showed that their definition of prox-regularity atx € C is equivalent to several 
statements. One of the most prominent might be local single-valuedness of the pro- 
jector [29, Theorem 1.3] around x. Kruger et al. showed that prox-regularity implies 
e-subregularity in [26, Proposition 4(vi)]. As the next remark shows all constraint 
sets involved in the phase retrieval problem are, in fact, prox-regular. 
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Remark 23.5.1 (phase retrieval constraint sets are prox-regular) Of great impor- 
tance for the convergence analysis of the introduced algorithms is the 
e-subregularity of the measurement sets defined in (23.1). By [3, Example 3.1.b] 
circles are subregular at any of their points x for all (x, v) in the graph of the nor- 
mal cone of the sets. As mentioned before e-subregularity covers a divers range of 
regularity notions for sets. The measurement sets investigated here are in fact shown 
to be semi-algebraic [30, Proposition 3.5] and prox-regular by [29, Theorem 1.3] 
and (6.11). 

The other sets that are involved in the phase retrieval problem are the quali- 
tative constraints introduced in (23.2) or mentioned before. Except for the ampli- 
tude constraint and the sparsity constraint all of these sets are convex and thus by 
[3, Proposition 3.1 (vii)] subregular. Fortunately, the amplitude constraint describes 
coordinatewise circles when the other coordinates are fixed, like the measurement 
constraint. Hence, the amplitude constraint is e-subregular as well (and additionally 
semi-algebraic and prox-regular). The sparsity constraint A, is prox-regular at all 
points x satisfying ||x||9 = s (similar to the proof in [14, Proposition 4.4]). 


By [12, Proposition 4.8] the projector onto a closed convex set is averaged with con- 
stant a = 1/2. Allowing sets to have a more general regularity, here prox-regularity, 
yield regularity of the projectors as well. 


Proposition 23.9 (projectors and reflectors onto prox-regular sets) Let 2 C E be 
nonempty closed, and let U be a neighborhood of x € C. Let A C QANU. If 2 is 
prox-regular at x with constant € on the neighborhood U, then the following hold. 


(i) Let ee [0, 1). The projector Pg is pointwise almost firmly nonexpansive at 
each y € A with violation e := 2e + 2e? on U. That is, at each y € A 


lx — yl? + lx’ — xl] < +e) Ir —y'P (vr €) (Vx € Pox’). 


(ii) The reflector Ro is pointwise almost nonexpansive at each y € A with violation 
& := de + 4e on U; that is, for allye A 


lx — yll < V1 + elx’ — yll (Wr € U) (Yx € Pox’). 


Proof By [26, Proposition 4(vi)] prox-regularity of 2 at x implies that the set 2 is 
e-subregular at x for all (c, v) € gphNg, where c € U. The result follows then from 
[3, Theorem 3.1]. 


Note that Proposition 23.9 presents a special case of [3, Theorem 3.1], where 
the authors allowed their sets to be e-subregular for certain normal vectors. By 
Proposition 23.5 compositions and convex combinations of averaged mappings are 
again averaged. Combining this with Proposition 23.9 implies that compositions of 
projectors are averaged. Thus, the algorithms presented in Sect. 23.3.2 are pointwise 
almost averaged as we see in Sect. 23.7. 

Whereas the regularity of the individual sets imply almost averagedess of the 
mapping 7, metric regularity relies on the regularity of the whole collection of sets 
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{21, 22, ..., Rm}. The idea of regularities of collections of sets traces back to [26, 
Theorem 3] by Kruger, Luke and Thao, but the analysis there covers only consistent 
feasibility problems, i.e. the intersection of sets is nonempty. A generalized notion of 
subtransversality proposed in [3, Definition 3.2] includes inconsistent settings too. 


Definition 23.10 (subtransversal collection of sets) Let {21,..., 2m} be a col- 
lection of nonempty closed subsets of € and define Y : E” 3 E" by Y(x) := 
Po (Ix) — Hx where 2 := 21 x R X +++ X Rm, the projection Po is with respect 


to the Euclidean norm on E£” and IT : x = (x1, X2, ...,Xm) > (X2, X3, . - - , Xm, X1) 
is the permutation mapping on the product space €” for x; € E (j = 1,2,...,m). 
Let X = (X1, X2, ..., Xm) E E” and y € Y (x). The collection of sets is said to be 


subtransversal with gauge u relative to A C E" at x for y if Y is metrically subreg- 
ular at x for y on some neighborhood U of x (metrically regular on U x {y}) with 
gauge u relative to A. As in Definition 23.6, when u(t) = kt, Yt € [0, 00), one says 
“constant x” instead of “gauge u(t) = kt”. When A = E, the quantifier “relative to” 
is dropped. 


In [3, Proposition 3.3] Luke et al. showed that for a consistent feasibility problem sub- 
transversality of the collection of sets is equivalent to what is elsewhere recognized 
as linear regularity of the collection [31]. 


23.6 Analysis of Cyclic Projections 


Having introduced the main tools for convergence, this section is devoted to an 
explicit demonstration of how this framework can be applied. In particular, we present 
the main steps of the convergence analysis of the cyclic projection mapping as done 
by Luke et al. in [3]. 

As introduced in Algorithm 6.2.1 the method of cyclic projections on a finite col- 
lection of closed subsets of E, {82,, R2, ..., 2m} (m > 2), is defined by the mapping 


Pit PA Pao Pa Pa, (23.10) 


m 


that we denote for notational simplicity by Po. For an initial point u? the algorithm 

generates a sequence (u*) ren BY utt! € Pyu*. For the analysis of Po it is convenient 

to introduce some auxiliary sets. We denote by £2 the product of the sets 2, on E”, 
R := 2) XIX: X Qn. 

Let T € Fix Po and fix € Z(m) where 


Zw = [(:=z-Nz|zeWCE”, z= u} (23.11) 
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for 
Wo := {x ee" | Xm Pot Po,Xj+ j = 1,2, ..., m — 1} . (23.12) 


Note that iz C j = 0. The elements of Wo are all cycles of the cyclic projection 
method, where each coordinate of x corresponds to an inner iterate of Po. The first 
coordinate xı of x € Wo is, thus, a fixed point of Po. The vectors ¢ € Z(u) are 
called difference vectors. Their coordinate entries provide information about the 
gaps between the inner iterates of a cycle of the mapping Po. 

To monitor the inner iterations, we consider the cyclic projection algorithm lifted 
to the product space €”. That is, generate the sequence (x*)ken by xt"! € Tex" with 


m—1 


Tz E™ 3 E™x > I [xt xt —G,....xf — > G | lat ePoxıt (23.13) 
j=1 


for € € Z(u) where u € Fix Po. Thus, the first entry of Te belongs to the cyclic 
projection mapping Po. Whereas the other entries of T;x indicate how close or 
distant x" is from a certain cycle specified by C. In order to isolate cycles, we restrict 
our attention to relevant subsets of €”. These are 


WQ := {x € &" |x — Tx =}, (23.14) 
L := an affine subspace with Te (Lol, (23.15) 
A:=LNWÄ). (23.16) 


The set W (C) contains all points whose entries have a certain distance to each other, 
namely ¢;. In particular, W (C) contains all fixed points of Tz. The affine subspace L 
is used to restrict the analysis to an affine subspace that contains the iterates x* of Te. 

To apply the convergence framework, Theorem 23.4.1, there are two major steps 
we have to take. First, we have to show that the mapping is averaged. Since the cyclic 
projection mapping is, as its name suggests, a composition of projectors averaged- 
ness, this not hard to show by the concepts presented in Sect. 23.5. Second, metric sub- 
regularity needs to be proven. For this, we state an auxiliary result that relates metric 
subregularity to subtransversality of the collection of sets (see [3, Proposition 3.4]). 


Proposition 23.11 (metric subregularity of cyclic projections) Let u € Fix Py and 
Ce Z(u) and letx = (X1, X2, ..., Xm) € Wo satisfy Ç = x — IIx with xı = u. For 
L an affine subspace containing x, let Tọ : L = L and define the mappings for 
Pz := T; — Id and Y := (Pg — Id) o II. Suppose the following hold: 


(i) the collection of sets {21, R2, ..., Qm} is subtransversal at x for Ç relative to 
A := LA W (Q) with constant x and neighborhood U of x; 
(ii) there exists a positive constant o such that 
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dist (¢, Y(x)) < odist (0, ®g(x)), Vx € ANU with xı € Qı. 


Then ® is metrically subregular for 0 on U (metrically regular on U x {0}) relative 
to A with constant K = Ko. 


Proposition 23.11 indicates that subtransversality plus the additional assumption (ii) 
are enough to deduce metric subregularity of ®z := Tz — Id as required in Theorem 
23.4.1. Using this connection and the development in Sect. 23.5 about almost aver- 
agedness we can state the following convergence result which is an implication of 
Theorem 23.4.1. 


Theorem 23.6.1 (convergence of cyclic projections) Let So C FxPo Æ Ø and 
Z = Unes Z (u). Define 


j-l 
Sj = Ucez (s-¥) j=1,2,...,m. (23.17) 


i=1 


Let U := U; x U2 x --- X Um be a neighborhood of S := Sı x S2 X +++ X Sm and 
suppose that 


j 3-1 
Pa, («-2) <sH-), Vu € So, YÇ € Z for each j =1,2,...,m, 
i=l i=l 


(23.18) 
Po, Uj S Uj for each j =1,2,...,m (Um41:= U1). (23.19) 


For fixed Č € Z and F € S with =X — IIX, generate the sequence (x*)ken by 
xttle Test for Te defined by (23.13), seeded by a point xe Wid) NU for Wo) 
defined by (23.14) with x? EQN. 

Suppose that, for A := L N aff (Ucez W(¢)) D S such that Te : A = A for all 
Ç € Z and an affine subspace L D aff(x*))k € N, the following hold: 


(i) Q; is prox-regular at all xj € S; with constant €; € (0, 1) on the neighborhood 
U; for j =1,2,...,m; 


(ii) for each E = (X1,%2,...,%m) € S, the collection of sets {Q,, R2, ..., Rm} 
is subtransversal at X for Ç := X — ITX relative to A with constant k on the 
neighborhood U; 


(iii) for Ye := Tẹ — Id and Y := (Pg — Id) o II there exists a positive constant o 
such that for all ¢ eZ 


dist (C, Y(x)) < ø dist (0, ©x(x)) 


holds whenever x € AN U with x; € 21; 
(iv) dist (x, S) < dist (x, D'ON A) forallx € UN A, forall Ĉe Z. 
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Then the sequence (x*) xen initialized by a point x" € Wie) N U with x? E€ 2ı NU} 
satisfies 


dist (x“*", Fix Te N S) < cdist (x*, S) 


k ; — z l-a =. m ~)\_ ~ 

whenever x" € U with c:=,/1+€— — where €:= L= (1 + čj) Pers 
Ite; ._ m Su . >: = l-a : 

de; (=e)? a = aa and x:=xo. If, in addition, Kk < ,/ aa then dist 


a Fix TN S) — 0, and hence dist (xf, Fix Po A Sı) — 0 at least linearly with 
ratec < 1. 


Proof This is a special case of [3, Theorem 3.2] when the sets are prox-regular. 


Remark 23.6.2 Theorem 23.6.1 is rather long and technical at first sight, though 
the pieces are easily parsed. Equations (23.17)-(23.19) force the iterations to stay in 
specific neighborhoods. This is needed to apply Proposition 23.9 with the help of 
(i) to deduce pointwise almost averagedness of Po and likewise of Tz. Assumptions 
(ii) and (iii) then yield metric subregularity of ®: = T; — Id by Proposition 23.11. 
This is where the construction in the product space comes into play. Working on 
E™, we were able to use subtransversality to show metric subregularity of z. It is 
worth mentioning that, until now, we were not able to show metric subregularity for 
the mapping directly associated to Py. Adding assumption (iv) in Theorem 23.6.1 
we can finally apply Theorem 23.4.1 and deduce convergence of 7; with the given 
constants. At this point the definition of 7 becomes crucial. Since the first iterate 
of the sequence x* generated under the mapping Te is nothing more than applying 
the method of cyclic projections Po, convergence of x* implies convergence of x$, 
that is, the sequence generated by cyclic projections. In [25] Luke et al. discussed 
the necessity of subtransversality for alternating projections to converge R-linearly. 


23.7 Application to Phase Retrieval Algorithms 


In Sect.23.6 we have seen how to apply Theorem 23.4.1 on the method of cyclic 
projections. This section is devoted to the analysis of other well known algorithms 
which we introduced in Sect. 23.3.2. The analysis in Sect. 23.6 focuses on showing 
how to satisfy the assumptions of Theorem 23.4.1 in the context of set-feasibility. 
This section aims to provide a broad intuition of the convergence of projection based 
algorithms used to solve the phase retrieval problem. This explains also why the 
statements given next are presented in a cartoon-like manner. The statements include 
only the most important parts that yield local convergence, but not how to construct 
it nor at which rate. Nevertheless, these are verifiable by following the approach in 
Sect. 23.6. 


Corollary 23.12 (convergence of the error reduction algorithm) Let Fix Pe Pm, 4 
Ø. The error reduction algorithm, that is alternating projections as discussed 
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in Sect.6.2.3 on the sets G and Mı, converges locally linearly to a point x € 
Fix Pe Pm, whenever the mapping ® = Pe Pm, — Id is locally metrically sub- 
regular at its zeros. 


Proof Following Luke et al. in [32, Sect. 3.2.2], we represent C as R? and reformulate 
the phase retrieval problem as a feasibility problem with entrywise values in R?. Then 
this is an application of Theorem 23.4.1 using Remark 23.5.1. 


Remark 23.7.1 In contrast to Theorem 23.6.1 metric subregularity is required 
directly in Theorem 23.12. Equivalently, we could demand subtransversality of the 
collection of sets {G, Mı} plus the additional assumption (iii) in Theorem 23.6.1. 
The problem here is, that, until now, it is not clear when and where these two assump- 
tions are satisfied. Illustrative examples and numerical simulations indicate that they 
hold in many instances. Nevertheless, there are certain situations when at least one 
of the two assumptions is violated (see for instance [33]). Moreover, allowing metric 
subregularity under some gauge depicts the reality sometimes better than restricting 
the analysis to a linear setting. One example is the setting of alternating projections 
applied to the sphere S and a line tangent to S at x = (0, —1). In this instance the 
algorithm does not converge linearly to x, although it converges depending on the 
initial point (see for instance [3]). This problem is not only interesting for the type 
of convergence, but also when it comes to the actual numerical implementation of 
algorithms. Although sets in real-life applications intersect tangentially on a set of 
measure zero, beyond a certain numerical accuracy the distinction between tangential 
intersection and linear convergence with a rate constant within 15 digits of 1 is rather 
academic. Having a relatively large gap between sets for inconsistent feasibility is 
in fact an advantage for the numerical performance of an algorithm. 


Theorem 23.7.2 (convergence of Fienup’s HIO method) Let 5, = 1 for all n and 
Fix 5 (ReRm, + Id) Æ Ø. The HIO algorithm, defined in (6.9) that is Douglas- 
Rachford as defined in (6.15) on the sets S and M,, converges locally linearly to a 
point x € Fix 5 (ReRm, + Id) whenever the mapping ® = A (ReRm, + Id) — 
Id is locally metrically subregular at its zeros. 


Proof Since Fienup’s HIO for 3, = 1 for all n can be identified with the Douglas- 
Rachford method the result follows from [3, Theorem 3.4]. 


Even if one had an infinite detector, noisy measurements make the phase retrieval 
problem almost always inconsistent. It is easy to prove [8, Theorem 3.13] that, in this 
case, Fix 5 (RoRm, + Id) = and so ® does not possess zeros. Consequently, 
Fienup’s HIO algorithm cannot converge. To circumvent this problem, one can use 
a relaxed version of Douglas-Rachford, the relaxed averaged alternating reflections 
method (RAAR), that we introduced in Sect. 6.1.2 which is adapted to inconsistent 
feasibility. 


Theorem 23.7.3 (convergence of RAAR) relaxed averaged alternating reflections. 
Let x € Fix Tease for Traar defined in (6.22). The relaxed averaged alternating reflec- 
tions applied to a phase retrieval problem converges locally linearly to a point 
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x € FIX Tease Whenever the mapping ® = à (ReRm, + Id) +d —A)Pm, —Id 
is locally metrically subregular at its zeros. 


A detailed proof of the convergence analysis for the relaxed averaged alternating 
reflection algorithm can be found in [33] by the authors of this chapter. There we use 
subtransversality of the collections of sets in general feasibility problems to make the 
connection to metric subregularity of the algorithm in question. The analysis does 
not use prox-regularity as the desired type of regularity for sets yielding the almost 
averaging property, but rather the property of being super-regular at a distance. This 
extends notions of regularity of sets to their effect on points that are not in the sets. 
Their definition is in line with e-subregularity and is thus connected to the analysis 
of [3]. 


Remark 23.7.4 In[33] we not only provided a convergence statement for the relaxed 
averaged alternating reflections method, but also gave a description of the fixed point 
set of the underlying mapping. For super-regular sets at a distance, the fixed points, 
if they exist, are either points in the intersection of both sets or relate to the local 
gap between these, if the intersection of the sets is empty. This result is in line with 
[11] where Luke studied the case of one set being convex and the other prox-regular. 
In contrast to the original Douglas-Rachford algorithm, the main advantage of the 
relaxed version is that existence of fixed points does not depend on whether the 
feasibility problem is consistent. Connecting this observation to the convergence 
analysis presented here, in practice the Douglas-Rachford/HIO is much less stable 
than the relaxed version. 


Following the ideas above, it is not hard to show that most projection methods are 
pointwise almost averaged mappings when applied to the phase retrieval problem. 
Nonetheless, the property of metric subregularity is still an open problem in some 
important cases. Thus, local convergence can be easily verified, but it is hard to 
quantify. 


23.8 Final Remarks 


When it comes to computing (see Remark 23.7.1), whether a method converges, let 
alone determining the rate depends on the numerical precision. But also inconsistency 
has an impact on the numerical performance. Closely related to this, we want to stress 
another feature of the analysis surveyed here. That is, sometimes less information 
can lead to better performance of an algorithm. For a demonstration we analyze a 
data set recorded by undergraduates at the X-Ray Physics Institute at the University 
of Göttingen. It is an optical diffraction image with model constraints v/T find = 
1, 2, ..., m, as in (23.1) with m = 1 and n the dimension of the image and additional 
support constraint. The full data set has dimension n = 1392 x 1040, the cropped 
data setn = 1287. The graphs shown in Figs. 23.1 and 23.2 are produced by applying 
the alternating projection algorithm, i.e. error reduction, on the data sets individually. 


23 Convergence Analysis of Iterative Algorithms for Phase Retrieval 599 


As it turns out alternating projections on the full data set (Fig. 23.2) shows a worse 
convergence behavior than the image with the limited data set (Fig. 23.1). Not only 
that the algorithm needs more iterations to reach a certain accuracy (9.8485 x 10+ 
instead of 666), but also the rate of linear convergence when the iterates reach a 
suitable neighborhood is worse. Noteworthy is the observed gap in both problem 
instances. In the full data set version the gap is smaller than in the version with a 
limited data set. We conjecture that this behavior is closely related to the property of 
metric subregularity, or in the context of set feasibility, subtransversality. The more, 
and better, information one has, the closer the constraint sets come to intersect. But 
this can included cases in which the sets intersect transversally as well. In cases like 
these the method of alternating projections does not have to converge locally linearly 
but can show a sublinear convergence behavior (see for instance [3, Remark 3.2]). 
The take home message in this context is that more information does not have to yield 
a better image when applying numerical algorithms. This is good news and bad news 
for these algorithms. The good news is that one can profit from implicit regularization 
with smaller problem sizes. The bad news is that this indicates a type of dimension 
dependence of these methods: the higher the dimension, the worse the constants 
in the linear convergence rates. This is not surprising and points to the need for 
models that lead to algorithms whose performance (that is, regularity) is dimension 
independent. While our discussion here focuses on the theoretical analysis rather 
than the comparison of the presented algorithms we point the reader to a study by 
Luke et al. [22], where the authors present a thorough review of first-order proximal 
methods for phase retrieval algorithms. 
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Chapter 24 A) 
One-Dimensional Discrete-Time Phase E 
Retrieval 


Robert Beinert and Gerlind Plonka 


Abstract The phase retrieval problem has a long and rich history with applications 
in physics and engineering such as crystallography, astronomy, and laser optics. 
Usually, the phase retrieval consists in recovering a real-valued or complex-valued 
signal from the intensity measurements of its Fourier transform. If the complete 
phase information in frequency domain is lost then the problem of signal reconstruc- 
tion is severelly ill-posed and possesses many non-trivial ambiguities. Therefore, it 
can only be solved using appropriate additional signal information. We restrict our- 
selves to one-dimensional discrete-time phase retrieval from Fourier intensities and 
particularly consider signals with finite support. In the first part of this section, we 
study the structure of the arising ambiguities of the phase retrieval problem and show 
how they can be characterized using the given Fourier intensity. Employing these 
observations, in the second part, we study different kinds of a priori assumptions on 
the signal, where we are especially interested in their ability to reduce the non-trivial 
ambiguities or even to ensure uniqueness of the solution. In particular, we consider 
the assumption of non-negativity of the solution signal, additional magnitudes or 
phases of some signal components in time domain, or additional intensities of inter- 
ference measurements in frequency domain. Finally, we transfer our results to phase 
retrieval problems where the intensity measurements arise, for example, from the 
Fresnel or fractional Fourier transform. 
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24.1 Introduction 


In the classical phase retrieval problem, one is usually faced with the recovery of a 
complex-valued signal from intensity measurements of its Fourier transform. Recov- 
ery problems of this kind have many interesting applications in physics and engi- 
neering like crystallography [1-3], astronomy [4, 5], and laser optics [6, 7]. We 
particularly refer to Chap. 2. Without further information about the unknown signal, 
the phase retrieval problem is highly ambiguous such that the recovery of the true 
solution within the solution set is nearly hopeless. 

In this chapter, we consider the one-dimensional discrete-time variant of the phase 
retrieval problem, where we restrict ourselves to the recovery of complex-valued sig- 
nals with finite support length. The solution set of this problem can be characterized 
by investigating and factorizing the related autocorrelation function, which coincides 
with the squared given Fourier intensity, see [4, 8, 9]. As a consequence of this char- 
acterization, we show how ambiguities of the discrete-time phase retrieval problem 
are related to the true solution signal. Trivial ambiguities are caused by multiplication 
with a unimodular constant, time-shifts, and reflection and conjugation. Non-trivial 
ambiguities are essentially obtained by conjugation of linear factors of the algebraic 
polynomial being defined by the signal values [8]. The number of these non-trivial 
ambiguities essentially depends on the structure of the given intensities [10]. 

Depending on the application, one can incorporate different a priori conditions or 
further information about the unknown signal in order to get rid of the unwanted ambi- 
guities. One approach to reduce the solution set is to assume that the unknown signal 
is real-valued and non-negative. The non-negativity condition is usually employed if 
the original signal represents an intensity or a probability density, see for instance [3, 
4, 11]. The a priori non-negativity is, moreover, exploited by a variety of numerical 
methods like the alternating projection algorithm [5, 11-13] or adapted multilevel 
Gauß-Newton methods [6]. In the one-dimensional case considered here, the non- 
negativity constraint is, however, very erratic [14]. In special cases, the restricted 
phase retrieval problem can become uniquely solvable. However, in many situations, 
the non-negativity assumption may either not reduce the solution set at all or may 
lead to an empty solution set. 

Sometimes, like in wave front sensing and laser optics [7], one has additionally 
access to the magnitudes of the unknown signal itself. The obtained restricted one- 
dimensional phase retrieval problem with a priori magnitude information in time 
domain can be efficiently solved by multilevel Gauß-Newton methods [6, 15, 16]. 
While these numerical methods work well in certain cases, their stability strongly 
depends on the given Fourier intensities [17]. Moreover, the algorithms can con- 
verge to approximate solutions which essentially differ from the true solution signal. 
On the basis of these numerical observations, we study the question whether the 
knowledge of magnitudes of the signal components can guarantee uniqueness. Our 
findings imply that the related phase retrieval problems are uniquely solvable for 
almost all finite-length signals [8, 18]. But there also exist instances of non-unique 
phase retrieval problems with given magnitudes of the signal components [10]. Our 
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results on uniqueness of solutions can be transferred to phase retrieval problems with 
additional phase information in time domain [18]. 

A further approach to reduce the solution set or to ensure uniqueness is to exploit 
additional measurements in frequency domain, which arise from the interference of 
the unknown true signal with an appropriate reference signal. If the reference signal is 
known beforehand, the solution set of the discrete phase retrieval problem is reduced 
to at most two different signals [8, 19, 20]. Under mild assumptions, one can also use 
an unknown reference signal to guarantee uniqueness [8, 21-23]. Besides employing 
known or unknown reference signals that are not related to the unknown true signal, 
it is also possible to use a modulation of the unknown signal itself as a reference 
[24, 25]. 

One possible generalization of the classical phase retrieval problem is to replace 
the discrete-time Fourier transform by some other signal transform. If we restrict 
the one-dimensional discrete phase retrieval problem to signals with fixed sup- 
port {0,..., M — 1}, which can be represented as M-dimensional vectors, then the 
Fourier intensities |X (wx) | at different points wg € [—7, 7] can be written as magni- 
tudes | (x, vg) | with vg := (a, If we now replace the the Fourier vectors vu, 
by elements of an arbitrary frame of C™ , the question arises how to choose the frame 
vectors to ensure a unique recovery of the true signal. This question has been stud- 
ied, for instance, in [26-29]. Further generalizations, where the Fourier transform 
is replaced by the signed Fourier transform or by the short-time Fourier transform, 
have been studied in [30] and [31-33] respectively, see also the references therein. 

The generalization of the Fourier phase retrieval problem with respect to a suitable 
frame often goes hand in hand with the a priori assumption that the true signal 
possesses a sparse representation in the frame. Phase retrieval problems of this kind 
have been studied for the shearlet frame [34] and for the translation invariant Haar 
pyramid tight frame [35]. Certainly, the sparsity assumption is not restricted to frame 
representations. If the sparsity of the true signal is sufficiently strong, this a priori 
condition guarantees uniqueness in the classical phase retrieval problem too, see [31] 
and references therein. Moreover, the sparsity of the true signal plays a key role in 
the recovery of spike and spline functions [36, 37] as well as in the reconstruction 
of structured functions [38]. 

This chapter is organized as follows. In the first part, Sects. 24.2 and 24.3, we 
introduce the one-dimensional discrete-time phase retrieval problem in more detail 
and derive a characterization of the entire solution set by factorizing the autocorrela- 
tion function—the squared Fourier intensity—suitably. Using this characterization, 
we show that each ambiguity is caused by rotation, time-shift, and conjugation and 
reflection of the factors in a convolution representation of the true signal. 

In the second part—Sects. 24.4—24.6—we exploit our findings on the solution 
set to investigate different a priori assumptions and additional information about the 
signal with respect to their capability to ensure a unique recovery of the unknown true 
signal. In particular, we study the three a priori assumptions: non-negativity, addi- 
tionally known direct measurements or intensity measurements in time domain, and 
additional intensity measurements in the frequency domain. Here the measurements 
in the frequency domain arise from the interference of the true signal with another 
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signal. We particularly study the interference with a known reference signal, the 
interference with an unknown reference signal, and interference with modulations 
of the unknown solution signal. 

Finally, in Sect. 24.7, we briefly discuss a generalization of the discrete-time 
phase retrieval problem where the Fourier transform is replaced by a so-called linear 
canonical transform. The linear canonical transform covers an entire class of well- 
known transforms like the Fresnel and the fractional Fourier transform. Due to the 
structure of these transforms the characterization of the solution set and uniqueness 
guarantees can be easily transferred to the new setting. 


24.2 The Discrete-Time Phase Retrieval Problem 


The central task in phase retrieval is the recovery of an unknown complex-valued 
signal from the measured intensity of its Fourier transform. In other words, we 
have completely lost the phase information in the frequency domain. Although the 
Fourier transform itself is a well-understood isometric isomorphism, the missing 
phase significantly hampers the reconstruction process and turns the phase retrieval 
problem into an ill-posed, quadratic inverse problem. 

In this chapter, we consider the discrete-time version of the phase retrieval 
problem that can be stated as follows: recover an unknown complex-valued signal 
x := (x[n])aez from its Fourier intensity 


| F[x])| := |x) := (w ER). (24.1) 


> x[n] een 


nel 


Throughout the paper, we assume that the true signal x has a finite support, which 
means that only finitely many components x[n] are non-zero. We say that the sig- 
nal x has a support of length N if there exists an integer no such that x[no] and 
x[no + N — 1] are non-zero and x[n] = 0 for all n ¢ {no,...,no+ N — 1}. Since 
the exponential sum in (24.1) has only finitely many terms, the Fourier intensity |*| 
is here always well-defined. 

The Fourier intensity |x| is closely related to the autocorrelation signal a := 
(aln])nez of x given by 


a [n] = $ x [k] x [k +n] (ne2). 


keZ 


The coefficients of the autocorrelation signal are conjugate symmetric, which means 
a[n] = al—n] for all n in Z. Further, the support of the autocorrelation signal is 
always {-N + 1,..., N — 1}, where N again denotes the support length of the 
original signal x, and does not depend on the actual position of the non-zero elements 
of the true signal x. 
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Using the definition of the autocorrelation signal, we observe 


FW? = I 2 ake = STD adhd alk + ne" = aw), 


nel, keZ neZ keZ 


where @ is called the autocorrelation function of x. The phase retrieval problem is 
thus equivalent to the recovery of the true signal x from its autocorrelation signal a. 
Due to the support {-N + 1,..., N — 1} of the autocorrelation signal a, the squared 
intensity |X|? is here a trigonometric polynomial of degree N — 1, which implies that 
the Fourier intensity |X| is already completely determined by 2N — 1 measurements 
in [—7, m). For convenience, we nevertheless assume that the entire Fourier intensity 
is given. 


24.3 Trivial and Non-trivial Ambiguities 


The unknown phase of X in the frequency domain cannot be completely arbitrary 
since the squared Fourier intensity is a trigonometric polynomial. However, without 
further information, the phase retrieval problem is never uniquely solvable. The 
simplest occurring ambiguities are 


1. rotated signals (e* x[n])nez with a € R, 
2. time-shifted signals (x[n — no])nez with no € Z, and 
3. the reflected and conjugated signal (x[—n])nez; 


which obviously have the same Fourier intensity |7| as the true signal x. Since these 
signals are, however, closely related to the true signal x, we call these ambiguities 
trivial. 


In the following, we are interested in all non-trivial solutions of the discrete-time 
phase retrieval problem. Before we give an explicit characterization, let us start with 
the following observation. If our true signal x can be represented as a convolution 
x = xı * xX defined by 


(x1 xin] := Yxılklalin-k) (neZ), 


keZ 


where xı and x» are two signals with finite support, than the Fourier convolution 
theorem implies that the signal 


(e= xıln)) ez s (x[n — nol) nex (24.2) 


with a € R andno € Z has the same Fourier intensity |x|. Differently from the trivial 
ambiguities, the constructed signal in (24.2) can have a completely different structure 
than the original signal x. In this section, we will show that all ambiguities—trivial 
and non-trivial—in discrete-time phase retrieval can be written as in (24.2), which 
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means that they are caused by rotation, time-shifts, and reflection and conjugation 
of the single factors with respect to an appropriate convolution. 

For this purpose, we will derive a suitable factorization of the given autocorrelation 
function @ by exploiting that the trigonometric polynomial @ is closely related to the 
algebraic polynomial P, of degree 2N — 2 defined by 


2N—2 
Pa) = Y,aln-N+1]2" EC). (24.3) 


n=0 
More precisely, we have 
IPW)? = aw) = OY Pye“). 


In the following, we call P, the algebraic polynomial associated to G. 

Due to the conjugate symmetry a[n] = a[—n] forn = 0,..., N — 1, the polyno- 
mial P, is here conjugate palindromic, which implies that all roots occur in pairs of 
the form (y, Y~!), where y and 7~! have exactly the same multiplicity. Moreover, 
zeros on the unit circle have an even multiplicity. Hence, the associated polynomial 
can be written as 


N- 
P,(z) = aN uf @-W@-T). 
Using the identity 
[er -ype - 771 = apt le 4,117, — e® 
DEN 3 (24.4) 
=lyl le" -= yl, 
we obtain the factorization 
N-1 


w) = | Pre”) = JalN - 1)| | [le -ype - Frl 


j=l 
N-1 

= lan - 11 ] [iy Tle ieee ie 
j=l 


see for instance [8, 10, 39]. 

The square root of @ now yields the Fourier transform of a finitely supported signal, 
and hence a solution of the phase retrieval problem with respect to the autocorre- 
lation function @. Interchanging the role of y; and y7! in (24.4), we can explicitly 
construct further non-trivial solutions of the problem. With this idea in mind, we can 
characterize all solutions x of the discrete-time phase retrieval problem with given 
squared Fourier intensity |x|? = @. 
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Theorem 24.1 ([8]) Leta@: R — [0, 00) be an arbitrary non-negative trigonomet- 
ric polynomial of degree N — 1. The Fourier transform of every finitely supported 
signal x with |F|? = @ can be written in the form 


1 


N-1 _ 
Fw) = ee [an - 111 ] [el J e - 4), (24.5) 


j=l j=l 


where a is a real number, no is an integer, and 3; is chosen from the zero pair 
(Yj 7) of the associated polynomial P4. 


In Theorem 24.1, the trivial rotation ambiguity is covered by the factor e!“, and the 
time-shift ambiguity by the factor ei”. Further, if the true signal x corresponds to the 
zero set {(G),..., Gy_1}, then the reflected and conjugated signal x[—-] corresponds 
to the zero set [BT seek Boat Consequently, the trivial reflection and conjugation 
ambiguity is also covered. 

Employing the representation (24.5) of all ambiguities in the frequency domain, 
we can finally show that every non-trivial solution x of the phase retrieval problem 
[X]? = @can be described by a suitable convolution factorization of the true signal x. 


Theorem 24.2 ([8]) Let x and y be two discrete-time signals with finite support 
and the same Fourier intensity |x|. Then there exist two finitely supported signals x, 
and x such that 

X =X KM 


and 


y= (c= x [—n]) ez * (x2 [n — Nol) nex? 


where a is a suitable real number and no is a suitable integer. 


Using the characterization of all solutions in Theorem 24.1, we can construct 
2N=! zero sets {ß1,..., Bv-1} by choosing either 3; = y; or Bj = m for j = 
1,..., N — 1,andeach zero set determines a solution X of the phase retrieval problem 
|x|? = @. Remembering that the reflection of all zeros on the unit circle corresponds 
to the reflection and conjugation of the related signal, we can therefore have at most 
2N=2 different non-trivial solutions. 

The true number of non-trivially different solutions X can, however, be much 
smaller and depends on the number of different zero sets {(),..., Gv—1} that can 
be constructed. If all zeros lie on the unit circle, then 7; and Ve coincide for all 
j=1,...,N—1, and the solution is thus unique. A similar observation holds if 
some zero pairs (Ye, Yg ') have a higher multiplicity m, > 1, where the number of 
pairwise different zero sets {fı,...., Gy—1} is then reduced accordingly. 
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Theorem 24.3 ([10]) Let x be a discrete-time signal with finite support. Further- 
more, let L be the number of distinct zero pairs (Yọ, 7) of the associated polynomial 
P, to the autocorrelation function @ not lying on the unit circle, and let m, be the 
multiplicity of these zero pairs. The corresponding phase retrieval problem to recover 
the signal x exactly has 


1 L 
lem +1 
t=1 


non-trivial ambiguities. 


Example 24.1 The actual number of non-trivial ambiguities in phase retrieval 
strongly depends on the zeros of the autocorrelation function. For example, the phase 
retrieval problem related to the autocorrelation function given by 


G(w) = | Pale) | = | (e + $) (e™ +2) | fem + ei |, 


has exactly three non-trivially different solutions, namely 


and 


Reflecting more than two zeros at the unit circle from !/2 to 2 only produces further 
trivial ambiguities caused by conjugation of the linear factors. The absolute value 
and the coefficients of the three non-trivially different solutions x1, x2, and x3 are 
shown in Fig. 24.1. 


a 22: Be —— en 2, a: — [en — 3 
a 
T T T T T 
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(a) Fourier intensities. (b) Signal moduli. (c) Signal phase. 


Fig. 24.1 Phase retrieval problem @ = |F|? with exactly three non-trivially different solutions as 
in Example 24.1, see [10] 
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24.4 Non-negative Signals 


As we have seen in Theorem 24.3, the solution set of the discrete-time phase retrieval 
problem usually consists of a vast number of non-trivially different solutions that 
strongly differ in shape and form. To recover the true signal x within the solution 
set, we have to rely on further a priori knowledge on the desired signal. In many 
applications, we can assume that the unknown signal is real-valued and non-negative, 
see for instance [3, 4, 11]. Therefore, we study the problem: how many non-trivial 
real-valued non-negative signals exist satisfying |£]? = @ for a given autocorrelation 
function? In other words, can the a priori assumption that x is real-valued and non- 
negative help us to find a unique solution or at least essentially reduce the number 
of non-trivial ambiguities? 

Let us now assume that x has a finite support of the form {0,..., N — 1}, and 
that all components x[n] with n € {0,..., N — 1} are real and non-negative. The 
representation (24.5) in Theorem 24.1 without rotations and time-shifts—with a = 0 
and ng = 0—yields the solution 


N-1 


Tw) = la [N - 1113 [| 18,1? (e™ — 8). (24.6) 


j=l 


The non-negativity of x is now equivalent with the condition that the coefficients of 
the algebraic polynomial Q given by 


N-1 


Q(z) := [| [@- 8) (24.7) 


J=1 


are non-negative. Since the zeros (3; are always non-zero, the leading coefficient and 
the absolute term of Q have even to be strictly positive. Algebraic polynomials of 
this kind are usually called positive polynomials. 

A closer inspection shows that the non-negativity condition does not always reduce 
the number of non-trivial ambiguities of the phase retrieval problem. 


Theorem 24.4 ([14]) Let x be a real-valued discrete-time signal with finite support. 
If the zero set {ßı,..., By—1} corresponding to X in (24.6) is contained in the left 
half plane, which means that Re 8; < O forall j = 1,..., N — 1, then all occurring 
real-valued non-trivial ambiguities of the corresponding phase retrieval problem are 
non-negative. 


Proof Since the polynomial Q in (24.7) is real-valued, the zeros (3; have to be real 
or have to come in conjugated pairs (3;, Bj). The corresponding linear or quadratic 
factors have the form 


(eo — Bj) or (e™ - Be" — B,) =e 7 — 2 (Re Be" + |B)’. 
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Fig. 24.2 Non-reduced non-negative solution set for the phase retrieval problem @ = [X], see [10] 


By assumption all coefficients in these factors are non-negative. Therefore, also all 
coefficients of the polynomial Q are non-negative, and Q therefore always leads to 
a non-negative solution x of the phase retrieval problem. 


However, if the assumption of Theorem 24.4 is not satisfied, the number of non- 
negative non-trivial ambiguities of the discrete-time phase retrieval problem is usu- 
ally reduced. 


Example 24.2 Figures 24.2, 24.3 and 24.4 show some different cases which can 
occur under the restriction of non-negativity. Figure 24.2 presents all non-trivial 
solutions that can be constructed from the autocorrelation function [x|*, where x is 
the marked signal of length 6 being determined by the zero set 


{G1,..-, Gs} el B, 3, 2, £, aF 


As shown in Theorem 24.4, all non-trivial solutions being constructed via (24.6) are 
real and non-negative. The solution set is presented without reflected, conjugated 
signals. In this example, we have 24 = 16 different solutions, which is the maximal 
number of non-trivial ambiguities by Theorem 24.3. 

In the second example, see Fig. 24.3, the condition of nonnegativity is strong 
enough to ensure uniqueness of the phase retrieval problem. Here, the unique non- 
negative solution xı corresponds to the zero set 


{B1,..., Bs} := [-3, -4 + 3i, 4 3i, 1+ li, 1- li}. 


Note that the problem has only four non-trivial ambiguities since the complex zeros 
of the characterization (24.5) additionally have to be chosen as complex conjugated 
pairs. 

In the last example, Fig. 24.4, the restriction of non-negativity is too strong since 
every solution of the phase retrieval problem possesses some negative coefficients, 
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Fig. 24.3 Unique non-negative solution of the phase retrieval problem @ = |X]?, see [10] 
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Fig. 24.4 Empty non-negative solution set for the phase retrieval problem @ = |X]?, see [10] 


which means that the given phase retrieval problem cannot be solved by a real-valued 
non-negative signal. The signal x; in Fig. 24.4b here corresponds to the zero set 


(n. Os} = {5,-5 + $4, -5 — Zi, 1 + li, 1 — li}. 


Since the coefficients of an algebraic polynomial continuously depend on the 
zero set {G),..., Gy—1}, the number of non-negative non-trivial ambiguities around 
a true signal x with non-zero components remains unchanged in a certain (small) 
neighbourhood of the zero set. On the basis of this observation, one can show that 
neither the signals that are uniquely defined by their Fourier intensity and the a 
priori known non-negativity nor the signals that are not uniquely defined by these 
conditions form negligible sets. Unfortunately, there is no simple way to decide from 
the given intensity data whether the a priori condition of non-negativity is helpful 
for reducing the number of ambiguities of the discrete-time phase retrieval problem. 
Mathematically, we can show the following result. 


Theorem 24.5 ([14]) The set of real-valued discrete-time signals with support 
{0,..., N — 1} of length N > 0 that can be recovered uniquely up to reflection as 
well as the set of signals that cannot be recovered uniquely from their Fourier inten- 
sities employing the non-negativity constraint are both unbounded sets containing a 
cone of infinite Lebesgue measure. 
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In other words, the non-negativity of the true signal is not an appropriate a priori 
assumption in order to guarantee uniqueness of the solution of the related one- 
dimensional phase retrieval problem. 


24.5 Additional Data in Time-Domain 


In certain applications like electron microscopy, wave front sensing and laser optics 
[6], we may have direct access to one or more signal values x[n] or to magnitudes 
|x[n]| in the time domain. In order to exploit these additional information, we need 
to know the position of the measurements within the support of the true signal. 
To simplify the following considerations, we restrict ourselves to finite discrete-time 
signals x = (x[n])nez with support {0,..., N — 1}. These signals may be interpreted 
as complex-valued vectors x = (x[n]) in C”. If x[0] and x[N — 1] are addition- 
ally non-zero, then we call the support of the signal x with length N normalized. 

Xuetal. already considered the a priori constraint that, besides |x| for areal-valued 
signal, also the endpoint x[N — 1] is known [40], which almost always enforces 
uniqueness. In [18], these ideas have been generalized to discrete-time phase retrieval 
problems with given magnitudes of the form |x[r]| or partial phase information 
arg(x[n]) in the time domain. 

Again, the question arises whether a priori information of this type is sufficient 
to determine a unique solution of the discrete-time phase retrieval problem (up to 
trivial ambiguities). 


24.5.1 Using an Additional Signal Value 


In order to get a heuristic idea, we start with the following question: for a given 
non-negative trigonometric polynomial @ of degree N — 1 as in Theorem 24.1 anda 
given constant C € C, how many non-trivial solutions x with support {0,..., N — 1} 
exist for the constrained phase retrieval problem 


R=" and x[N-1]=C? (24.8) 


As we know already from Theorem 24.1, there exist at most 27? non-trivially 
different signals x = (x[n])nez with Fourier intensity |x|? = @. But how many of 
these solutions also satisfy the side condition x[N — 1] = C? 

To answer this question, we employ our knowledge about the structure of the solu- 
tions in (24.5). Recalling thatx(w) = ee x[n] e71", we notice that the coefficient 
x[N — 1] in (24.5) with no = 0 is given by 


N-1 1/2 
x[N — 1] = e [iain - ui ff Du 
j=l 
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We therefore derive the consistency condition 
N-1 
ICP = latn - 111 ] ] 181": (24.9) 
j=l 


If this condition is not satisfied, there will be no solution signal satisfying the 
side condition x[N — 1] = C. Assuming that (24.9) is satisfied for some zero set 
{G1,..-, Pn-ı), we find at least one solution of (24.8), where we take a according 
to the phase of the complex value C. By Theorem 24.1, all further solutions with 
Fourier intensity |f|? = @ are obtained by reflecting zeros from 3 j to Br at the unit 
circle for some indices j € {1,..., N — 1} in the representation (24.5). 

Let us now assume that there is indeed a second solution x of (24.8) satisfying 
(24.9), and let {h, REN Bead be the corresponding zero set in (24.5). Then we can 
assume without loss of generality that the corresponding zeros are given by 


— | i 
~ B; J =Q E 


n= B; otherwise 
for some L € {1,..., N — 1}. The consistency condition (24.9) now implies 
N-1 L N-1 
il =] iea J iar 
j=1 j=1 j=L+1 


and thus I- |B; |? = 1. We can therefore state the following theorem. 


Theorem 24.6 ([8]) Let x be a complex-valued discrete-time signal with normalized 
support of length N, i.e., X is of the form (24.5) with ng = 0. Then the constrained 
phase retrieval problem to recover the signal x from its Fourier intensity |x| and the 
signal value x[N — 1] is uniquely solvable if and only if 


II? #1 


BjeA 


for each non-empty subset A of B, where B denotes the set of values in the corre- 
sponding zero set of x not lying on the unit circle. 


Since the support of x is normalized to {0,..., N — 1}, and since the rotation 
factor a in (24.5) is here fixed by the phase of the given constant C, we have no trivial 
ambiguities caused by rotation or shift. Moreover, the reflection and conjugation 
ambiguity cannot occur since we have here particularly assumed I |B; 2 £1. 
The simplification of the set of zeros to those with modulus different from | can be 
done since the reflection of zeros on the unit circle does not lead to further non-trivial 
solutions. 
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24.5.2 Using Additional Magnitude Values of the Signal 


Next, we generalize the problem considered in (24.8) and assume that, besides the 
Fourier intensity |X]?, either all or at least some of the magnitudes |x[n]| with n = 
0,..., N — 1 are given. Phase retrieval problems with these constraints have been 
considered for example in [6, 15, 16]. The numerical approaches to find the phase 
retrieval solution in [15] are based on multilevel Gauß-Newton methods. However, 
these algorithms are not always stable and sometimes reconstruct signals that are 
different from the desired solution. 

We therefore study the uniqueness of solutions of the following constrained phase 
retrieval problem: for a given non-negative trigonometric polynomial @ of degree 
N — l asin Theorem 24.1 and a given C > 0, how many non-trivial solutions x with 
support {0,..., N — 1} exist to the constrained phase retrieval problem 


RP= and |x[N—1]|=C? (24.10) 


To characterize the solutions of (24.10), we can proceed similarly as in Sect. 24.5.1. 
Doing so, we obtain the same consistency condition (24.9) but, obviously, the given 
absolute value |x[N — 1]| will give us no information how to choose the rotation 
factor e'“; so we cannot get rid of the rotation ambiguity. 


Corollary 24.1 ([8]) Let x be acomplex-valued discrete-time signal with normalized 
support of length N, i.e., X is of the form (24.5) with ng = 0. Then the constrained 
phase retrieval problem to recover the signal x from its Fourier intensity |x| and the 
absolute value |x[N — 1]| is uniquely solvable up to rotations if and only if 


[[ 16? #1 


PjeA 


for each non-empty subset A of B, where B denotes the set of values in the corre- 
sponding zero set of x not lying on the unit circle. 


In [18], we have generalized these observations to the constrained phase retrieval 
problem of the form 
RP= and |x[n]|=C (24.11) 


for somen € {0,..., N — 1}. Similarly as before, one can derive a consistency con- 
dition and a condition in terms of the zeros of the autocorrelation function @ such 
that uniqueness of a solution x of (24.11) is guaranteed up to trivial ambiguities. 
However, the corresponding conditions are more complex and require an extensive 
investigation of the (N — 1)-variate elementary symmetric polynomials, which are 
related to the components of the true signal x by Vieta’s formulae. The important 
outcome of these investigations can be summarized as follows. 
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Theorem 24.7 ([18]) Let x be a complex-valued discrete-time signal with normal- 
ized support of length N, and let £ be an arbitrary integer between 0 and N — 1. The 
phase retrieval problem to recover the signal x from its Fourier intensity |x| and 
the absolute value |x| N — 1 — £]| is almost always uniquely solvable up to rota- 
tions whenever £ 4 (N-D/2. In the special case that £ = (N-D/2, the reconstruction 
is almost always unique up to rotations and conjugate reflections. 


‘Almost always’ means here that the union of all signals with normalized support 
of length N, which permit a further non-trivial solution, corresponds to the union of 
finitely many algebraic varieties with Lebesgue measure zero in R?™, see [18]. In 
particular, we almost always obtain uniqueness if the magnitudes of all signal values 
|x[n]|,n =0,..., N — 1, are given. 


Corollary 24.2 ([10]) Let x be a complex-valued discrete-time signal x with nor- 
malized support of length N. The phase retrieval problem to recover the signal x 
from its Fourier intensity |x| and its moduli (| x[n] |)nez is almost always uniquely 
solvable up to rotations. 


Obviously, Corollary 24.2 is a simple consequence of Theorem 24.7. But the 
following question remains: is the knowledge of all magnitudes |x[n]| with n = 
0,..., N — 1 already sufficient to obtain a unique solution of the constrained problem 


RP= and |x[n]|=C, for n=0,...,N—1 


up to rotation ambiguities? The following example shows that this is unfortunately 
not the case. 


Example 24.3 We consider the complex-valued signal x determined by th corre- 
sponding zeros 


Bi=-}-3i A=-e (+i), and Bis -e 11 $i) 


and by a = 0 and no = 0 in the representation (24.5). Knowing the autocorrelation 
function @ (w) = |X (w) |? for w € R and the moduli of all components |x [n] | for 
n € Z, we still cannot recover x uniquely. In this specific example, we find, up to 
rotations, three non-trivial solutions that are presented in Fig. 24.5. 

It is possible to construct further examples with several non-trivial solutions for 
all dimensions N of the problem, see [10]. Hence, the a priori known moduli of the 
components strongly reduce the set of ambiguities, but we cannot ensure uniqueness 
(up to trivial ambiguities) for every signal. 


Using an analogous approach, one can study the restricted discrete-time phase 
retrieval problem where the knowledge of additional magnitudes in time domain is 
replaced by a priori phase information in time domain. Due to the trivial rotation 
ambiguity, we can only expect to reduce the non-trivial solution set if we have given 
the phase of at least two signal components. 
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Fig. 24.5 Three non-trivial solutions of phase retrieval problem @ = [X] with given moduli |x [7] | 
forn = 0,..., 3 as in Example 24.3, see [10] 


Theorem 24.8 ([18]) Let x be a complex-valued discrete-time signal with normal- 
ized support of length N, and let £; and £, be different integers between0 and N — 1. 
The phase retrieval problem to recover the signal x from its Fourier intensity and 
the two phases arg x[N — 1 — £] and arg x[N — 1 — £2] is almost always uniquely 
solvable whenever £ + €2 Æ N — 1. If £; + £2 = N — 1, then the reconstruction is 
only unique up to conjugate reflections, except for the special case where £ and £ 
correspond to the two endpoints. 


24.6 Interference Measurements 


Another possibility to reduce the ambiguities in the considered discrete-time phase 
retrieval problem is to exploit additional reference measurements of the form | F[x + 
h]|, where h is a suitable reference signal with finite support. We consider here two 
cases, either the reference signal h is known beforehand, or it is also unknown. In 
the first case, we will show that the corresponding phase retrieval problem with 
given intensities |F[x]| and | F[x + h]| has at most two non-trivial solutions. If h is 
unknown, we assume that the Fourier intensities |F[x]|, |F[/]|, and | F[x + h]| are 
given and show unique recovery results under suitable side conditions. Finally, we 
will examine the special case where the unknown reference signal is a modulated 
version of the true signal x itself. 


24.6.1 Interference with a Known Reference Signal 


Let us assume that the considered reference signal h = (h[n])„ez, has finite support 
and is completely known beforehand. The corresponding phase retrieval problem is 
then nearly unique solvable up to at most one ambiguity, see [8, 20]. 
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Theorem 24.9 ([8]) Let x and h be two discrete-time signals with finite support, 
where the non-vanishing reference signal h is known beforehand. Then the signal x 
can be recovered from the Fourier intensities 


| Fix]| and |F [x +h}| 


except for at most one ambiguity. This ambiguity ist trivial if h possesses a linear 
phase. 


Proof Let y =x +h be the interference between the unknown signal x and the 
known reference signal h. Then y is a finite length signal with known Fourier intensity 
| FLy]| = | F [x + h] |. Further, with Xw) = |Fw)| e*“ and h(w) = fw) el"), 
it follows 


Pw)? = FW)? + Aw)? + 2 Re Kw)Aw)) 
= KW) + Aw)? + 2 [FW)| [h(w)| costo) — Yw)) 


such that we can extract the phase difference ¢(w) — (w) up to the sign and a 
multiple of 27 for every w € R. Due to the piecewise continuity of the phases ¢ and 
%, there is an open interval where the sign has to be either plus or minus everywhere. 
Since each trigonometric polynomial is completely determined by its values on an 
open set, we conclude that there can be at most two different solutions. If we write 
these solutions x and x in the form X(w) = |Xw)| e and Z (w) = |Xw)| eb, 
then the phases ¢; and &» are related by 


pi) — Yw) = -Palw) + YW) + 27 Eu, 


i.e., do(w) = —ġı (w) + 2U(w) + 27 £u. If the reference h has linear phase, which 
means that the phase w is of the form d(w) = now + a for some no € Z and a € R, 
then x is a trivial ambiguity of x obtained by support shift and rotation. 


If the signal h does not have linear phase, then the ambiguity x can be non-trivially 
different as discussed in the next example. 


Example 24.4 Let us consider the discrete-time phase retrieval problem to recover 
the true signal 


x:= zh (...,0,55 — 15i, —84 + 87i, 34 + 82i, 
204 — 120i, —16 + 16i, —96, 128, 0,...) 


from the Fourier intensities | F[x]| and | F[x + h]|, where h is the known reference 
signal 


h:=(...,0,0, 20 — 10i, 19 17i, —4 — 41,4 4i, 16,0,...). 
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Fig. 24.6 Discrete-time phase retrieval problem to recover x from the Fourier intensities | F[x] | 
and | F[x + h]|, where h is a known reference signal as in Example 24.4. Besides the true solution 
x, the non-trivial ambiguity x is the only further solution of the problem, see [10] 


Here, we have underlined the entry with index 0. Since h does not possess a linear 
phase, cf. Fig. 24.6c, there may exist a further non-trivially different solution by 
Theorem 24.9, and, indeed, the signal 


Žž = z3 (..., 0, 160 — 80i, —28 — 96i, —173 + 31i, 
95 — 44i, 76 + 16i, —120 — 44i, 40 — 8i, 0, ...) 


yields the same Fourier intensities. The signals and the given Fourier intensities are 
presented in Fig. 24.6. 


24.6.2 Interference with an Unknown Reference Signal 


Let us now consider the case where the finitely supported signal h is also unknown. 
For real signals, this problem has already been studied in [21]. For complex signals, 
we want to refer to the work of Raz et al. [22], where, besides the three Fourier 
intensities in the next theorem, a fourth intensity of the form [x(w) + ih(w)| was 
used for the recovery of x. From a theoretical point of view, this intensity is not 
needed to ensure uniqueness, but this additional information allows the derivation of 
an explicit analytic solution. 


Theorem 24.10 ([8]) Let x and h be two discrete-time signals with finite support. If 
the corresponding zero sets of the signals x and h are disjoint, then the two signals 
x and h can be recovered from the Fourier intensities 


Fix], | Fih], and |F[x+h)| 


uniquely up to common trivial ambiguities. 


‘Common trivial ambiguities’ means here that we can multiply the two signals 
x and h with the same unimodular constant e'™“ or shift the two signals with the 
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same integer no or take the reflection and conjugation for both signals, and all these 
actions do not change the given Fourier intensities in Theorem 24.10. For a detailed 
proof of this theorem, we refer to [8]. The main part of the proof is heavily based on 
the result of Theorem 24.2, where we have shown that each further solution (x, h) 
with Fourier intensities | F[x]| = |F x] | and |F[h] | = | F [A] | is related to the true 
solution (x, h) by some factorization 


Fw) =F WW) for Fw) = FW) RW) 


and PA 
hw) = b+ hiw) how) for hw) = hw) how) 


with rotations œi, & € [—7, m) and shifts nı, na € Z. The assertion then follows 
from a detailed comparison of the third Fourier intensity 


Fw) thw)? = Fw) + hw? 


by incorporating the product representations of the signals. 

If the assumption of Theorem 24.10 is violated and x and h have common zeros 
in the defining zero sets in representation (24.5), then we may find more non-trivial 
solutions, as shown in the following example. 


Example 24.5 We want to recover the signal x in Example 24.4, which corresponds 
to the zero set 


{Gi,..., Bo} = 3 {1 +i, 3 — 2i, —3 — i, —4 + 2i, 4 + 4i, 2 — 4i} 


in the representation (24.5) of X. Further, we choose the reference signal h with the 
corresponding zero set 


f(m,- ns} = } {1 +i, 4+ 4i, —4 — 3i, —4 + 2i, —4i}. 


If both signals are unknown, we have to recover x and h from the Fourier intensities 
| F[x]|, |F[A]|, and | F[x + h]|. The intersection of the corresponding zero sets of 
x and h is here given by 

+{1+i,4+ 4i} 


such that the uniqueness of the solution is not covered by Theorem 24.10. Indeed, 
reflecting the zeros !/4 (1 + i) and !/4 (4 + 4i) in the representations (24.5) of both 
signals at the unit circle, we find a second non-trivial solution (x, h). Both solutions 
(x, h) and (x, h) are presented in Fig. 24.7. 
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Fig. 24.7 Discrete-time phase retrieval problem to recover x and h from the Fourier intensities 
|F[x]|, |F[h]| and | Fix + h]|, where h is an unknown reference signal as in Example 24.5. 
Besides the true solution (x, h), the non-trivial ambiguity (X, h) is also a solution of the problem, 
see [10] 


24.6.3 Interference with the Modulated Signal 


Finally we consider the model, where the unknown reference signal is a modulated 
version of the signal x itself. Similar approaches for the (periodic) discrete Fourier 
transform have already been studied in [25, 41]. We here especially rely on the 
results in [24]. The discrete-time phase retrieval problem can be now posed as fol- 
lows: recover a finitely supported signal x from its Fourier intensity |x| and a set of 
interference measurements 


| Fix + el el x], 


where the modulations and rotations are described by u € R and a € [0, 27). In 
order to guarantee uniqueness, besides the Fourier intensity |X |, we merely need two 
additional interference signals. 


Theorem 24.11 ([24]) Let x be a discrete-time signal with finite support of length 
N. If u satisfies the assumption that ku £ 0 mod 27r for allk = 1,...,2N — 1, then 
the signal x can be uniquely recovered up to a rotation ambiguity from its Fourier 
intensity |x| and the Fourier intensities of two interference signals 


[F[xtele™x]| and |F [x+ i e" x], 


where a and az are two real numbers satisfying a, — a2 # Tk for all k € Z. 


Proof Writing the unknown Fourier-transformed signal in the form X(w) = 
[x(w)| e'%), we only need to recover the phase d(w) to solve the given phase retrieval 
problem. The Fourier intensity measurements of the first interference signal yield 


|F [x tele x] | = Rw) +e Fw — WI 
= KW)? + Ww - WI? 
+ 2 [x(w)| [x — p)| cos(d(w — u) — dw) +a) 
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and thus the cosine of the relative phase 


cos(dlw — u) — Pw) + a1) = cos(a) cos(d(w — u) — Plw)) 
— sin(@ ) sin(dw — u) — Pw)). 


Analogously, we can extract cos(¢@(w — u) — d(w) + a2) from the Fourier intensity 
measurements of the second interference signal. Since a; — a2 Æ rk for all k € Z, 
we can therefore uniquely determine the phase difference p(w — u) — d(w) for every 
w € R. Obviously, the solution can be only recovered up to rotations. Taking an 
arbitrary phase (wọ), we can compute the corresponding phases (wọ + pk) for 
k =0,...,2N — 1 and thus the Fourier values X(wg + uk) fork = 0,...,2N — 1. 
It remains to recover the signal x and especially the unknown support from these 
Fourier values. Due to the support length N, the Fourier transform X can be written 


in the form 
N-1 


w) 2 eT iwno X Cn emin 


n=0 


with c, := x[n + no]. Using the found Fourier values, we obtain the equation system 


N-1 N-1 
F(wo + pk) = Vice iwo(n+no)] @—iku(ntno) — Xa, zk k=0,...,2N 1; 
n=0 n=0 


with d, := c e~t oaro) and z, := e+"), This system can be solved by Prony’s 
method if the values wo + uk mod 27 are pairwise different fork = 0,...,2N — 1, 
which means that ku is not a multiple of 27 for all k = 1,...,2N — 1, see for 
instance [42]. 


24.7 Linear Canonical Phase Retrieval 


Up to this point, we have assumed that the given measurements in the frequency 
domain arise from the Fourier-transformed true signal. These measurements can 
be seen as intensities in the so-called far field in Fourier optics. In this section, we 
briefly investigate the question: how do the established uniqueness guarantees change 
if we replace the far field intensity measurements for example by near field intensity 
measurements—what happens if we replace the Fourier transform by the Fresnel or 
fractional Fourier transform? 

The discrete-time Fourier, fractional Fourier, and Fresnel transform are special 
cases of the so-called linear canonical transform. Referring to [43, 44], for the real 
parameters a, b, c, and d with ad — bc = 1 and b Æ 0, we define the discrete-time 
linear canonical transform of the signal x := (x[n])„ez by 
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Cab... X] w) := Y x In] Kasea wn), 


nel, 
where the kernel K(q,5,c,a) is given by 


‘= ija 2 d,,2 
1 -ij e2 st zwi+ Fw ) 


Kiabi (w, t) = Faze 


The inverse discrete-time linear canonical transform is given by 


TH 


ah, alln] = J ¥ (w) Karan (ort) di, 
Be 


see [43, 44]. The classical discrete-time Fourier transform F coincides with the 
linear canonical transform Co, ı,—ı,0, up to a multiplicative constant 0 := 6(4,5,¢,) ‘= 
1//2nb e *, The discrete-time Fresnel transform [45] and the fractional Fourier 
transform [46] are covered by the linear canonical transforms C(1,1/:.,0,1) and 
Cos @,sin a, — sin a,cos a) with a € R, respectively. 

Since b Æ 0 by assumption, we can rewrite the linear canonical transform with 
respect to the discrete-time Fourier transform as 


Canea 1) = basen t2? F |k is Ne). 41D 


Let us now consider the linear canonical phase retrieval problem, where we wish 
to recover a complex-valued discrete-time signal x := (x[n])„en with finite support 
from the intensity |C«a,5,c,ay[x]| of its linear canonical transform. Similarly to the 
Fourier phase retrieval problem, we are particularly interested in the arising ambi- 
guities and in uniqueness guarantees. Using the alternative formulation (24.12) that 
relates the linear canonical transform to the discrete-time Fourier transform, it can 
be simply seen that the linear canonical phase retrieval problem to recover the true 
signal x is also solved by 


1. the rotated signal e'® x with a € R, 

2. the shifted signal e=“""" x[- — no] with no € Z, and 
3. the conjugated and reflected signal eh x[—-]. 
Again, these trivial ambiguities are of minor interest. 


The representation (24.12) implies that the linear canonical phase retrieval prob- 
lem to recover x from |Ciq,5,c,a)[x]| is equivalent to the recovery of the signal 


(x[n] ei *») „en from the Fourier intensity | F[x[-] er »]|, and we can immediately 
transfer the characterization of the complete solution set in Theorem 24.1 to the new 
setting. 


Theorem 24.12 ([44]) Let x be a discrete-time signal with finite support. Then each 
signal y with finite support satisfying 


|C(a,b,c.ay LY] | — [Cia b, calx] | 
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is characterized by 


N-1 N-1 
F[ae%*y] w= ei(@+wno) la[N — 1]| [ [14 Pe I] (ei. _ B;) , 
j=1 j=1 


where a is a real number, no is an integer, and pj is chosen from the zero pair 
(Yj, a) of the associated polynomial P, with respect to the autocorrelation signal 


of 0 e° x[-]. 


With the characterization of trivial and non-trivial ambiguities, the linear canonical 
phase retrieval problem also inherits the uniqueness guarantees of the discrete-time 
Fourier phase retrieval problem; so the solutions in linear canonical phase retrieval 
are almost always unique (up to trivial ambiguities) if we have access to further 
magnitudes or phases in the time-domain, cf. Sect. 24.5, or if further interference 
measurements are available, cf. Sect. 24.6. 
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Detector size, 393 
Dichroic imaging, 517 
Diffract and destroy, 437 
Diffraction 
Fresnel integral, 43 
scalar theory, 36 
single molecule, 436 
Diffractive optics, 102 
Direct reconstruction, 394 
Distribution 
Bernoulli, 132 
Binomial, 133 
Multinomial, 131 
Normal, 135 


Index 


Poisson, 129 
Dreiklang, 244 
Dronpa, 245 
Drosophila melanogaster, 220 
Duality, 319 
DyMIN, 16 


E 

Einstein, Albert, 71 

Electron microscopy, 162, 357 

Entire functions, 386 

Epoch, 334 

e-set regularity, 591 

Error bound, 584 

European Synchrotron Radiation Facility 
(ESRF), 570, 577 

EUV light generation, 523 

Ewald sphere, 437 

Exact penalization, 319 

Experimental data, 400 

Extreme Ultraviolet (EUV), 523 


F 
Feasibility, 174, 180, 184 
inconsistent, 585 
problem, 172, 178, 182 
Femtosecond x-ray pulses, 436 
Fenchel conjugate, 315, 319, 320 
Fenchel convex conjugate, 154 
Field-of-view, 393 
FilamentSensor, 264 
Filtered back-projection, 65, 355, 366+, 400 
Finite difference caculation, 96 
Finite support 
discrete time, 606 
Firm nonexpansive, 584 
Fixed point iteration, 175 
FLIM microscopy, 228 
Fluorescence lifetime, 228 
Fluorescence microscopy, 9, 147, 162, see 
also microscopy 
Focused ion beam, 106, 503, 565 
Forster Resonance Energy Transfer (FRET), 
126, 129, 139, 228, 235+, 249, 437 
Forward operator/forward map/forward 
model, 377, 380-382, 397, 398 
Fourier intensity 
one-dimensional, 606 
Fourier Shell Correlation (FSC), 445 
Four level system, 19 
Fractional Fourier transform, 623 
Franklin, Rosalind, 35 
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Fréchet-derivative, 381, 395 
Free Electron Laser (FEL), 424, 436, 458 
Free-space propagator, 42 
paraxial approximation, 43 
Fresnel 
diffraction integral, 43 
number, 43+, 58+, 107, 166, 339+, 378, 
390, 317, 575 
propagator, 63, 166, 355, 378, 385, 389+ 
reflectivity, 77 
scaling theorem, 46 
zone plates, 102 
Fresnel, Augustin Jean, 77 
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Gabor, Dennis, 54 

Generalized SART, 398+ 

Generalized Tikhonov regularization, 149 

Gerchberg-Saxton, 62, see also algorithm 

Göttingen Instrument for Nano-Imaging 
with X-rays (GINIX) 116, 400, 569, 
572 

Graphical Processing Unit (GPU), 400 

Green Fluorescent Protein (GFP), 242 


H 
Hadamard, Jacques, 145 
Helmholtz equation, 36 
High Harmonic Generation (HHG), 501, 523 
Hilbert, David (apocryphal), 165 
Hologram, 378 
multiple, 382, 383, 391, 392 
Holographic reconstruction, 388 
Holographic reference, 386 
Holography, 54+, 73, 89, 378+, 420, 574 
cone-beam, 341+ 
inline, 55 
reconstructions, 197, 386, 407+ 
reference, 386 
Homogeneous object, 379, 381, 384, 391 
Human embryonic kidney cells, 213 
Human mesenchymal stem cells (hMSCs), 
264 
Hybrid-input-output algorithm, 63, see also 
algorithm 
Hypothesis testing, 286+ 
Bonferroni correction, 294 
false detections, 290 
family-wise error rate (FWER), 294 
level, 287 
multiplicity effect, 292 
multiscale scanning, 299 
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I 

Ill-posed, 387, 394, 585 

Image reconstruction, 377 

Incomplete data, 389 

Index of refraction, 36 

Inline holography, 55, see also holography 

Intensity, 38 

Interfacial roughness, 80 

Interference, 618, 619, 622 

Inverse problem, 145, 377, 379 

IrisFP, 245 

Iteratively regularized (Gauss-Newton 
method, 155, see also algorithm 

Iterative reconstruction, 394, 400 

Iterative reprojection phase retrieval, 65 


K 

Kaczmarz method, 377, 398, see also algo- 
rithm 

Kirkpatrick-Baez mirror, 83 


L 

Labelled double-stranded DNA, 212 

Landweber iteration, 154 see also algorithm 

Laser-produced plasma, 549 

Law of small numbers, 133 

Levenberg-Marquardt algorithm, 155, see 
also algorithm 

Lindeberg central limit theorem, 136 

Linear canonical transform, 623 

Linearized model, 381, 387, 395 

Linear regularity, 593 

Lipschitz-stability, 387 

LiveACT, 277 

Local degree of ill-posedness, 159 

Lower level set, 319 

Low-frequency instability, 393, 400 


M 
Metal-Induced Energy Transfer Imaging 
(MIET) , 126, 139, 228+ 
Method of moments estimator, 219 
Metric regularity, 589 
Metric subregularity, 584, 589 
Microfluidics, 409 
Microscopy 
confocal, 13 
fluorescence, 9, 147, 162 
resolution, 11 
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soft x-ray, 72, 105+, 427, 458, 468+, 531, 
549+, 552, 561, 566 
stimulated emission depletion (STED), 
17 
Model 
Bernoulli, 132 
Binomial, 133 
Gaussian, 135 
Poisson, 128 
Molecular contribution function, 207, 222 
Molecular counting, 213, 222 
Monotone 
strongly, 316 
Monte Carlo, 443 
Moreau’s identity, 154 
Morozov’s discrepancy principle, 148 
Mulitlayer mirror, 85 
Multilayer Laue lens, 105, 107 
Multilayer reflectivity, 77 
Multilayer Zone Plates (MZP), 106, 561 
Multiple holograms, 382-384, 391, 392 
Multiscale scanning, 299, see also hypothe- 
sis testing 
Mutual intensity, 109 
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Nanoscopy, 3+, 205+ 
RESCue-STED, 16 
RESOLFT, 215, 241 
stimulated emission depletion (STED), 

16+, 208 

NEXAFS spectroscopy, 553 

Noise-level, 388 

Nonexpansive, 588 

Nonlinear reconstruction, 395, 400 

Non-negativity constraint, 379 

Non-uniqueness, 383 

Normal cone, 586 

Numerical aperture, 394 
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Optical stretcher, 413 
Optimality, 392 
Orientation field, 274 
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Parallelization, 399 

Paraxial Helmholtz equation, 41 
Penalty functional, 149 

Penalty term, 396 

Phase contrast imaging, 377 
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Phase contrast tomography, 339+, 358+, 
377+, 379, 382, 387, 393, 396 
Phase-image, 378, 396 
Phase lift, 190 
Phase object / non-absorbing object, 379, 
381, 400 
Phase problem, 384 
Phase retrieval, 166, 172, 385, 386, 437, 503 
constrained, 614-616 
CTF-based, 60 
discrete-time, 606 
one-dimensional, 604 
phase sets, 168 
transfer function, 517 
Phase-wrapping, 384 
Photocatalysis, 470 
Pink Laue mode, 464 
Plasmon enhancement, 523 
Plasmonic nanostructures, 523 
Point Spread Function (PSF), 15, 147 
Pointwise almost averaged, 584 
Pointwise quadratically supportable, 328 
Poisson noise, 438 
Polarization, 4 
Polyhedral, 316 
Positive polynomials, 611 
Poynting vector, 38, 532 
Primal dual methods, 327 
Problem 
feasibility, 172, 178, 182, 583+ 
ill-posed, 146 
inverse, 145 
well-posed, 146 
Projection, see projector 
Projection approximation, 51 
Projector, 170, 586+ 
alternating, 182, 190, 597 
approximation, 51 
averaged, 183 
cyclic, 178, 593+ 
gradient, 183 
tomographic, 377 
Propagated data noise error, 148 
Proximal map, 316 
Proximal mapping, 176, 316 
Proximal method, 176 
Proximal reflector, 176, 185 
Proximity operators, 153, see also proximal 
mapping 
Prox-regular, 186, 591, 592 
ProxToolbox, 166, 177-199 
Ptychography, 167, 172, 407, 418, 573 
PtyNAMi, 573 
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Pulsed laser deposition, 565 
Pure phase object/non-absorbing object, 
379, 381, 400 
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Radon transform, 65, 356, 387 

Reconstruction artifacts, 400 

Reflector, 171, 316, 586 

Regularization, 146, 395 

Regularization method, 148 

Regularized Newton methods, 377, 394, 
395, 398 

Relaxed-Averaged-Alternating-Reflections 
(RAAR) algorithm, 64, 445, 506, see 
also algorithm 

RESCue-STED, 16, see also nanoscopy 

Residual fluorescence, 248 

RESOLFT, 242, see also nanoscopy 

RESOLFT nanoscopy, 215 

Resolution, 11 

limit, 394 
Rayleigh, 11 

Reversibly Photo-Switchable Fluorescent 

Proteins (RSFP), 242 


S 
Saddle point problem, 319 
Scalar diffraction theory, 36 
Scanning SAXS, 406, 578 
Scanning WAXS, 576 
Signal 
non-negative, 611 
Silver-behenate, 470 
Single molecule diffraction, 436 
Sobolev norm, 160 
Soft X-ray, 549 
Soft X-ray microscopy, 552 
Source function, 157 
Hölder, 157 
logarithmic, 157 
Spherical harmonics, 439 
Stability, 377, 387, 390, 393, 396 
Stability estimate, 158, 387, 390-392 
Statistical Multi-Resolution Estimation 
(SMRE), 321 
Statistical thinning, 130, 132, 137 
Stimulated Emission Depletion (STED), see 
nanoscopy 
basics, 17 
resolution, 25 
STED nanoscopy, 208 
Stress fibers, 264 
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Subregularity 
linear regularity, 593 
Subtransversality, 593+ 
Support constraint, 379, 386, 388, 393, 394, 
396 
Switchable fluorophores, 215 
Switching fatigue, 249 
Switching speed, 248 
Synchrotron radiation, 113, 457 


T 
Takagi-Taupin, 86 
Takagi-Taupin equation, 107 
Theorem 
de Moivre-Laplace, 135 
law of small numbers, 133 
Lindeberg central limit, 136 
Three-photon correlation, 438 
Tomographic consistency, 379, 397 
Tomographic projection, 382 
Total external reflection, 75 
Twin-image, 389 
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Ultrafast crystallography, 462 
Ultrafast x-ray spectroscopy, 468 
Uncertainty principle, 390 
Uniqueness, 377, 383, 384, 386, 387 
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Variance stabilization, 138 


W 
Water window, 552 
Wavefront sensing, 532 
Hartmann, 532 
Waveguide, 88 
Waveguide optics, 358 
Waveguiding effects, 512 
Weakly sharp sets, 317, 319 
Weak scattering, 381 
Weak-sharp minima, 584 
Well-posed, 388 
Wigner distribution, 540 
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X-ray 
focusing, 72 
interfacial roughness, 80 
mirrors, 83 
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propagation, 35, 95 
reflectivity, 74 
waveguide fabrication, 97 
waveguides, 89 
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Zernike,Frits, 111 

Zone-plate 
Fresnel zone plates (FZP), 563 
multilayer zone plates (MZP), 562, 563 
on glass wire, 566 
volume effects, 107, 564 

Zone plate fabrication, 105 

Z-test, 287 


